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ABSTRACT 


Spatiotemporal clustering is the process of grouping objects based on both their spatial 
and temporal similarity. This approach is useful when considering the distance between 
objects and how that distance changes through time. Spatiotemporal clustering analysis is 
applied to the maritime domain in this thesis, specifically to a defined area of water, 
during a period of time, in order to gain behavioral knowledge of vessel interactions and 
provide the opportunity to screen such interactions for further investigation. The 
proposed spatiotemporal clustering algorithm spatially clusters vessels in the water space 
using k-means clustering analysis, kinematically refines the clusters based on the 
similarity of vessel headings, speeds and the distance between them, and temporally 
analyzes the continuity of membership of the kinematic clusters through time to 
determine which clusters are moving. The algorithm is implemented in the MATLAB 
programming environment, verified with a synthetic data scenario, and validated with 


two real-world datasets. 
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EXECUTIVE SUMMARY 


Maritime domain awareness (MDA) presents a continuous challenge to national strategic 
decision makers and technical analysts. MDA is based on the concept that maritime 
security is achieved or improved through developing an understanding of events 
occurring in the maritime domain. The ability to autonomously classify vessel movement 
at sea in order to gain behavioral knowledge of a water space presents a significant 
challenge, especially when more than 90% of the world’s commerce is conducted by sea, 
and non-traditional maritime threats, counter-proliferation, and piracy are increasingly 
more important to national security. Previously, much effort has been placed into the 
development of algorithms that employ time-series analysis to estimate vessel position 
and predict future vessel location and that determine what normal, and by extension, 
abnormal behavior of vessels at sea can be considered to be. These efforts support the 
creation and maintenance of the recognized maritime picture (RMP) but lack analysis of 


how ships interact and, by extension, a behavioral knowledge of a water space. 


The objective of this thesis is to develop a method to autonomously analyze and 
classify ship movement and possible intent at sea in order to gain behavioral knowledge 
of a specific water space. The approach utilizes spatiotemporal clustering (STC) in which 
spatial relationships between objects are studied as they change over time. The STC 
concept has been applied to other areas, including urban combat environments, 
georeferenced mobile device tracking, traffic incidents, and the spread of the avian 
influenza H5N1. A spatiotemporal extension of STC algorithms designed to analyze 
urban combat environments is used as a method of classifying paralleling and following 
movement behavior in the maritime domain because vessels involved in illicit activity 
may exhibit these behaviors. The proposed STC algorithm for MDA application is 
modeled in the MATLAB programming environment, and results for both simulated and 


real-world scenarios are presented. 


Although there are several methods to perform spatiotemporal analysis, in this 


thesis the process of first applying a k-means proximity filter, kinematically clustering 


XV 


vessels at each unique time-step based on likeness of course, speed, and distance, and 


then performing temporal analysis, is proven to be effective in the maritime domain. 


Both a simulation and real-world data analysis are presented using the MATLAB 
programming environment. The simulation scenario is presented to test operability of the 
kinematic clustering parameter thresholds of the STC algorithm and to verify 
functionality of text and visual outputs. Two real-world datasets, one taken from 
worldwide automatic identification system (AIS) position reports and the other from a 
global positioning system (GPS) source, are used with a hypothetical scenario to illustrate 


the possibilities revealed by this method of analysis. 


In the first real-world data analysis, a hypothetical scenario is presented in which 
a vessel of interest (VOI) might have transferred illicit cargo on a specific date in the 
Strait of Malacca. On the date of interest there are approximately 5,000 AIS vessel 
position reports in the area of interest. The STC algorithm developed in this thesis is 
applied to the AIS dataset to characterize VOI interactions with other vessels in the area 
of interest and to determine if the VOI interacts with or moves in coordination with any 
other vessels. The results of this analysis provide a realistic understanding of the 
capabilities of the STC algorithm. The VOI is found to be spatiotemporally clustered with 
another vessel at multiple time steps during a nearly four hour timespan. The results of 
the real-world data analysis offer enough insight as to the vessel’s interactions during the 


time period of interest for an analyst’s further investigation. 


The second real-world data analysis is performed on GPS data gathered over a 
twenty-three minute period via GPS transmitters mounted on six vehicles simulating a 
collection of small boats travelling together. The drive-plan for the experiment required 
three of the vehicles to form a convoy and maintain speed and distance for the duration of 
the exercise. The other three vehicles acted as confuser vehicles and moved in and out of 
the convoy. The analysis from the STC algorithm reveals that multiple moving clusters 
are tracked over the twenty-three minute period, which is in accordance with the drive- 
plan. The two real-world data analyses highlight two different possible uses for the STC 


algorithm. 
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I. INTRODUCTION 


Maritime domain awareness (MDA) presents a continuous challenge to national 
strategic decision makers and technical analysts. Defined as “the effective understanding 
of anything associated with the maritime domain that could impact the security, safety, 
economy, or environment of a nation,’ MDA is based on the concept that maritime 
security is achieved or improved through developing an understanding of events 
occurring in the maritime domain [1], [2]. A previous commander of the U.S. Sixth Fleet 
indicated that the interdiction of illegal or terrorist activity in today’s complex 


environment is utterly dependent on maritime domain awareness [3]. 


The gathering and sharing of information and intelligence, between international 
maritime and other partner organizations, in order to develop a recognized maritime 
picture (RMP), is a fundamental component of MDA [3]. In the past, the Department of 
Defense’s (DoD) MDA focus was on understanding opposing navies and maritime 
forces, but more recently this focus has shifted to commercial vessels, as non-traditional 
maritime threats, counter-proliferation, and piracy have become increasingly more 
important to national security [3]. Accordingly, these trends necessitate the tracking, 
managing, and understanding of extensive details about many more vessels, from 


merchant tankers to fishing trawlers and pleasure craft [3]. 


In addition to these emerging trends, the fact that more than ninety percent of the 
world’s commerce today is conducted via the sea results in an even greater need for 
capabilities to track vessels and assess their behaviors [4]. The ability to autonomously 
classify vessel movement at sea, in order to gain this behavioral knowledge of a specific 
water space, is a novel challenge. From a national security perspective, vessels of interest 
(VOD are often found to be moving together in formation, following or paralleling, or 
converging for transfer of illicit cargo. From a simple navigation radar display, an 
example of which is illustrated in Figure 1, to a vessel listing system like the automatic 
identification system (AIS), classifying these types of movement utilizing the tools 
available is a time-intensive process for even a team of analysts. Automating the analysis 
of ship movements through the use of a spatiotemporal clustering algorithm provides a 


flexible, user-interfaced solution. 
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Figure 1. A 24-hour satellite AIS position report collection in the Strait of Malacca. 


A. THESIS OBJECTIVE 


The objective of this thesis is to develop a method to autonomously analyze and 
classify ship movement and possible intent at sea in order to gain behavioral knowledge 
of a specific water space. The behaviors focused on are paralleling and following because 
they are common interactions between vessels at sea and their traits are easily defined. 


Vessels involved in illicit activity may exhibit either of these behavioral traits. 


A spatiotemporal clustering (STC) extension of the urban combat environment 
direction and displacement directivity algorithms presented in [5]-[7] is proposed in this 
thesis as a method of classifying paralleling and following movement behavior in the 
maritime domain. The proposed STC algorithm for MDA application is modeled in 
MATLAB, and results for both a synthetic simulation scenario and real-world dataset 


analyses are presented. 


B. RELATED WORK 


In the maritime domain much effort has been placed into the development of 
algorithms that employ time-series analysis to estimate vessel position, predict future 
vessel location, and determine what is normal, and by extension, abnormal behavior of 
vessels at sea [8]. Specifically, Tunaley describes the development of a ship detection and 
tracking program that is designed to analyze processed imagery rapidly and 
inexpensively and to deliver messages automatically by email [9]. Efforts like Tunaley’s 
support the creation and maintenance of the RMP but lack analysis of how ships interact 
and, by extension, a behavioral knowledge of the water space. One potential solution to 


overcome this shortcoming is STC analysis as proposed in this thesis. 


The STC concept has not previously been applied to the maritime domain but has 
been applied in other areas. Hwang et al. examined the properties of moving objects in 
road networks in [10] and defined spatiotemporal similarity between trajectories based on 
points and times of interest. From their work, early methods were proposed for searching 
for similar trajectories amongst moving objects in road networks, and one of these early 
methods serve as the framework for the STC analysis in this thesis. Si et al. later used 
STC via the space-time permutation scan statistic to analyze the spread of the avian 
influenza H5N1 in poultry, wild birds, and humans in [11]. Their STC analysis aided 
them in determining that H5N1 outbreaks showed a clear seasonal pattern and were 
shown to be relatable to the patterns of migration of wild birds. The space-time 
permutation model was an option for the basis of STC in this thesis, but the STC 
algorithm used in this thesis analyzes the water space of interest in one sweep instead of 


using multiple smaller area sweeps. 


Yuan and Raubal expanded on the STC concept and provided a framework for the 
extraction of spatiotemporal knowledge from georeferenced mobile phone devices and 
other information and communication technologies [12]. In [13], Eckley and Curtin 
presented methods for performing spatiotemporal analysis, with special attention given to 
the interpretation of the results for traffic incidents and also presented arguments for 


performing spatial and temporal analyses independently. The works presented in [12] and 


[13] expanded upon the use of STC to analyze data and explored the idea of extracting 


information from the analysis results, similar to the intent of the work in this thesis. 


Das et al. created STC algorithms to explore urban combat environments in [5]-[7] 
to introduce a method of analyzing troop movements and interactions on land. Their use of 
STC in combat environments was the genesis of the idea to extend spatiotemporal analysis 
to the maritime domain to study the interactions of vessels at sea. An extension of the 


direction and displacement directivity algorithm presented in [5]-[7] is used in this thesis. 


C. THESIS OUTLINE 


The outline of this thesis is as follows: background information on several topics 
including the United States Navy MDA concept, an explanation of one of the data 
sources used in this thesis, spatiotemporal clustering and its fundamental parts, and k- 
means clustering are presented in Chapter II. The STC algorithm components of data 
conditioning, cluster preprocessing, proximity filtering, kinematic filtering, temporal 
analysis, and the post-processing of spatiotemporal results to create easily understood text 
and visual outputs are discussed in Chapter II. The set-up and results of the simulation 
scenario and real-world data analyses are detailed in Chapter IV. A summary of the work 
completed, the significant results, and ideas for future work are provided in Chapter V. 


The MATLAB code that implements the STC algorithm is included in the Appendix. 


Il. BACKGROUND 


A brief overview of the DoD MDA concept was provided in Chapter I. In this 
chapter, a more detailed description of the Navy’s approach to MDA is presented in order 
to frame the current methods being utilized and bring to light the shortfalls faced by these 
methods. A discussion of a typical data source used in real-world analysis is presented, 
along with alternative sources. The STC concept and spatial and temporal data are 
defined. The advantages and disadvantages of three methods of STC are discussed, and 
an overview of the k-means clustering algorithm is provided. At the end of this chapter, 
the reader will have the necessary understanding to comprehend the details of the STC 


algorithm prior to its application to synthetic and real-world datasets. 


A. NAVY MARITIME DOMAIN AWARENESS CONCEPT 


The Navy MDA concept was elaborated on in May 2007 by then Chief of Naval 
Operations (CNO) Admiral Michael Mullen. The memorandum he signed provided a 
framework to prioritize the MDA efforts across the Navy, to ensure alignment with 
external MDA initiatives, and to outline the fleet MDA concept of operations (CONOPS) 
[14]. The implementation of MDA guidance was intended to take ten years, so efforts to 
improve MDA are ongoing and remain relevant [14]. Vice Admiral Nancy Brown, then 
Director, Command, Control, Communications and Computers (C4) Systems on the Joint 
Staff, summarized the importance of the Navy’s role in MDA in 2010 when she indicated 
that with the astronomical number of vessels at sea, the purpose of MDA is to gain an 
understanding of the small number of them that are involved in illicit activity, and that 


mission on the high seas is owned by the Navy [14]. 


The CONOPS for MDA organizes the MDA process into five activities. The 


activities are monitor, collect, fuse, analyze, and disseminate and are detailed in Table 1. 


Table 1. The U.S. Navy fleet maritime domain awareness process. From [3]. 


[—wenor [cost [Fane | Anaize | Dissinat_ 


Vessels Open-Source Data | Tracks with Tracks | Pattern Recognition | Display Networks 
People Intelligence Data with Data and Analysis — Indications and 
Cargo Partner-Nation Data | Track with Data Anomaly Detection Warning 
Organizations Sensors Threat Identification | Assessments 


Facilities Operators Estimates 


Infrastructure Field Personnel os and Cues for 
Areas of Interest Intelligence al 

Sea Lanes Agencies 

Threats 

Friendly Forces 

Weather 





MDA analysts and maritime operation centers (MOC) should have the ability to 
persistently monitor, access, and maintain information on vessels and craft, cargo, vessel 
crews and passengers, maritime infrastructure, and other identified areas of interest. 
Legally protected personal information and private sector proprietary information should 
be safeguarded and protected in accordance with U.S. and DoD regulations. The MOC 
should also be efficient in collecting, fusing, analyzing, and disseminating information to 
facilitate effective understanding and threat detection to decision makers and partner 
MDA organizations [3]. The introduction and development of a STC algorithm to support 
MDA seeks to support the Navy’s CONOPS for the MDA mission as described above. 


B. DATA SOURCE OPTIONS FOR STC ANALYSIS 


The STC algorithm is designed to ingest data in a particular manner, taking into 
consideration the position, course, and speed of vessels at sea. The data sources for the 
real-world analyses presented in Chapter IV are derived from AIS and the global 
navigation satellite system (GNSS), the broad term for the global positioning system 
(GPS). AIS is used because of its availability and the ease with which it can be formatted 
and used. In order to employ the algorithm, three requirements for a data source must be 
met. The first is that the data must provide knowledge of georeferenced vessel position, 
ideally in degrees of latitude and longitude. A source would be compatible as long as 
degrees of latitude and longitude can be extrapolated. The second requirement is that the 


data must provide knowledge of the course and speed associated with each vessel. The 


6 


third requirement is that each vessel position report include a time reference. With these 
three requirements met, other parameters such as the vessel identifier can be defined 
locally, as the global identifier need only identify individual vessels to the program user. 
Various other data sources could be used with this algorithm, including data derived from 
intelligence, surveillance, and reconnaissance systems, as long as the three requirements 


above are met. 


A disadvantage of using AIS as the primary data source is that a particular VOI 
may not properly report position or even transmit its AIS signal. Further, because only 
certain classes of vessels are required to operate AIS, a VOI may not be outfitted with the 
system. In this thesis the assumption is made that a VOI will properly report its position 
because vessels that are involved in illicit activity will often attempt to act as normal as 
possible in order to detract attention from them. Another assumption is that many vessels 
that are of interest for national security purposes fit the criteria to need the use of AIS 
onboard. The use of GNSS as a data source in the second scenario provides an alternative 
to AIS. GNSS does not have the above disadvantages, nor do the assumptions above need 


to be made. Those criteria are outlined in Chapter IV. 


C. SPATIOTEMPORAL CLUSTERING 


Kisilevich et al. [15] define STC as the process of grouping objects based on their 
spatial and temporal similarity. STC is a relatively new approach to data mining and has 
been primarily used in mobile device and other location-based data tracking problems. In 
its application to the maritime domain, STC analysis is used to detect paralleling and 
following behaviors between vessels. Paralleling behavior is when one vessel traces the 
same or similar spatial pattern at the same time as another vessel but is offset in space [7]. 
Following occurs when one vessel traces the same or similar spatial route as another 
vessel at a later time [7]. To provide a better understanding of STC it is necessary to 
define what spatial and temporal datasets are and how they can be analyzed to create a 


STC result. 


1. The Relationship Between Spatial and Kinematic Data 


Spatial data contains information that describes the location of objects in relation 
to Earth, to each other, or to another frame of reference. A metric of distance is used 
when describing spatial relationships between objects. Several distance metrics exist, but 
two of the most common are the Manhattan and Euclidean distances. The Manhattan 


distance d,,, between two points (x,,y,) and (x,,y,) is computed as the sum of the 


absolute differences of their Cartesian coordinates and is mathematically described as 


[16] 
dun =|x,-x,|+|y,-y,]- (1) 


This can be more easily thought of as the number of city blocks we would travel between 


points and is most relevant when measuring distance in a grid-constrained context [16]. 


Euclidean distance, on the other hand, is the more common straight-line distance 
metric and is defined as the length of the line segment that connects two points [16]. The 
Euclidean distance d,,, between two points (x,,y,) and (x,,y,) is mathematically 


described as [16] 


dey = (x,-x,) +(y,-9,) (2) 


To understand the underlying behaviors of paralleling and following, kinematic 
information is also required. Kinematics describes the motion of points in a particular 
frame of reference, and kinematic similarity between nearby objects can suggest 
collusion and other behaviors of interest [17]. In this maritime application, vessel 
characteristics like course and speed will factor largely into determining vessels that are 
exhibiting paralleling or following behavior. For this reason the STC algorithm does not 
exclusively spatially cluster vessels at sea, but rather does so kinematically by 


considering course and speed characteristics in addition to spatial relationships. 


2. Temporal Data 


Temporal data represents variable-state variation in time. It is often collected to 


analyze weather patterns, monitor traffic conditions, or study demographic trends, among 
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many other applications. Any variable that changes over time can be organized into a 


temporal dataset. The fifth stage of the STC algorithm is temporal data analysis. 


3. Spatiotemporal Clustering Approaches 


As stated, in this thesis STC analysis is applied to the maritime domain in order to 
autonomously classify vessel movement at sea in order to gain behavioral knowledge of a 
water space, specifically to determine when vessels exhibit paralleling or following 


behavior as an indicator of possible illicit behavior. 


Hwang, Kang, and Li [10] propose three methods to perform STC of objects 
moving on road networks. These methods can be extended to moving vessels at sea. In 
the first method, clusters are formed based on spatial clustering and are then refined with 
temporal analysis. In the second method, clusters are formed based on temporal 
clustering and then refined with spatial analysis. In the third method, clusters are formed 


based on the simultaneous consideration of spatial and temporal analyses. 


In this thesis the first method is used as the general approach to STC. The second 
method is not easily translated to the maritime domain because temporal clustering would 
yield much larger clusters in need of kinematic refinement and, in turn, create a more 
difficult problem. This would not be an efficient approach to analyzing a water space that 
has high traffic density, like a major strait. The third method is likely the most robust of 
the three, but because of the time-latent nature of maritime domain data, it is not 
necessary to simultaneously consider space and time. The simultaneous accounting of 
kinematic and temporal attributes creates a larger workload and leads to inefficiencies in 
the STC algorithms’ performance. Based on the first method above, vessels at sea are 
kinematically clustered and then refined through temporal analysis to form 


spatiotemporal results which can be processed for further understanding. 


D. K-MEANS CLUSTERING ANALYSIS 


A version of the k-means clustering algorithm was first proposed in 1957 as a 
technique for pulse code modulation, but the term “k-means clustering” came into use 


when McQueen [18] published the term in 1967 [19]. K-means clustering seeks to 


partition a set of n observations (5h can) into a set c of k clusters C= 1eice |, 


where both the observations and clusters are vectors of specified dimension. The k-means 
clustering process can be described as the optimization of the objective function [20] 


= argminY, Y |p, 3) 


jal xj€ cj 


where Ll; is the centroid of observations in Cc}. The centroid of a cluster is defined as the 


mean of the observations in the cluster as 


M=—>x, (4) 


q i=l 
where x, are members of the cluster and N, is the number of observations in the cluster. 


The most common form of the k-means algorithm uses an iterative refinement 


approach with two steps. In the assignment step an observation x, is assigned to the 


cluster whose centroid is closest to it using the nearest neighbor rule 
d(x,,4,) <d(x,,u,) forj#k and 1< j<k, (5) 


where the distance d is defined as 


2 


d(x;.H,) = > (4, -%,) - (6) 


m 


Once all observations have been assigned to a cluster, the update step calculates 
the new centroid of each cluster. The algorithm then goes back to the assignment step and 
reassigns the observations using the new cluster centroids. The algorithm is complete 


when the observation assignments no longer change [20]. 


The k-means clustering algorithm uses the Euclidean distance metric when 
minimizing the within-cluster sum of squares and calculating the mean in the assignment 
step. The main drawback to k-means clustering is that the number of clusters to be used 
k is an input parameter to the algorithm and a poor choice for k can lead to poor results. 


As such, any use of the k-means algorithm should include a discussion regarding the 
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selection of k. An example of the result of k-means clustering is given in Figure 2, in 


which k = 2 and the data points associated with each cluster are colored red and blue. 


® Cluster 1 
® Cluster 2 


M  Centroids 


Y-aniz pogition 
1 





X-axds position 


Figure 2. K-means clustering result on a random set of data points when k=2. 
After [21]. 


In this chapter an overview of the Navy’s approach to MDA was presented to 
frame possible uses of the STC algorithm. A discussion of AIS data and alternative 
sources was discussed. Spatial and kinematic data were conceptualized, and an overview 


of the k-means clustering algorithm was provided. 
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Wt. SPATIOTEMPORAL CLUSTERING ALGORITHM 


A background of the STC concept was presented in Chapter II. Specifically, the 
relationship of spatial and kinematic data was addressed, and three methods of STC were 
presented. The method of forming clusters kinematically and performing temporal 
analysis to form spatiotemporal clusters is applied to the maritime domain in this chapter. 
An overview of the proposed algorithm is presented, followed by an in-depth explanation 
of the progression and development of the STC algorithm, with a focus on its six stages: 
1) data conditioning, 2) cluster preprocessing, 3) proximity filtering, 4) kinematic 


filtering, 5) temporal refinement, and 6) post-processing of spatiotemporal results. 


The algorithm begins with an input dataset consisting of vessel position reports 
during a specific timeframe. After user-provided input for parameter definitions, the data 
is submitted to a data conditioning stage in which time indices are assigned and the data 
is formatted, filtered, and converted. The proximity filtering stage spatially refines the 
position attributes of the vessel position reports, and through kinematic filtering, clusters 
of vessels are formed at each time-step based on similarity of vessel courses, speeds, and 
distance between them. The kinematic clusters are temporally refined to determine which 
of them can be considered moving clusters and which are only present at one instance in 
time. Once vessel position reports have been analyzed both kinematically and temporally, 
the results are spatiotemporal in nature and completely describe how the kinematic 
relationships between vessels change over time. Behavioral knowledge of the water space 
can then be gained through post-processing of the results and interpretation of the text 
and visual outputs. The progression of the six stages of the STC algorithm along with 


data input and behavioral knowledge output are depicted in Figure 3. 
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Figure 3. Top-level view of the spatiotemporal clustering algorithm. 


A. INPUT 


As illustrated in Figure 3, there are two types of input provided to the STC 
algorithm. The first is the collection of vessel position reports, which is the bulk data the 
STC algorithm analyzes to gain behavioral knowledge of the water space of interest. The 
second input is the user-provided input. Different parameters must be defined, either 
through use of default values or through manual input, to enable the various stages of the 


STC algorithm to function. 


1. Collection of Vessel Position Reports 


Vessel attribute data reports, which are transmitted by vessels as discussed in 
Chapter II, containing information on position, course, speed, a timestamp of the report, 
and other attributes this work does not concern, are collected from a suitable data source. 
A vessel identifier, such as MMSI, is typically supplied, and a local identifier may be 
defined, if necessary. A typical data source for this application is AIS information. In this 
thesis, AIS is the primary data source used because of its availability. The International 
Maritime Organization (IMO) requires AIS to be fitted aboard international voyaging 
ships with gross tonnage of greater than 300 tons and all passenger ships regardless of 
size [22]. The IMO requirement means that most sea-going commercial vessels are 
outfitted with and required to transmit position reports in AIS. AIS is, therefore, a 
suitable selection as the input data source for the development of a STC algorithm aimed 
at improving MDA. A typical AIS position report dataset is formatted into eleven fields, 


the titles of which are provided in Table 2. 


The input dataset provided to the STC algorithm is a collection of AIS data for a 
specific period of time in a specific location. For the purposes of the STC algorithm, the 
data fields required are MMSI, speed over ground, longitude, latitude, course over 
ground, and the Coordinated Universal Time (UTC) timestamp. These six fields must be 
formatted and placed in the correct order to be analyzed and understood by the STC 
algorithm. This formatting occurs in two stages, data conditioning and cluster 


preprocessing, which are discussed following a description of user input. 


ie) 


Table 2. Typical field headings of an AIS vessel position and attribute report. 























MID MMSI 
Nav Status Rate of Turn 
Speed Over Ground Longitude 
Latitude Course Over Ground 
Heading UTC Timestamp 
Source 











2. User Input 


The algorithm requires four mandatory and five optional user inputs. The four 
mandatory user inputs are: 1) data source, 2) parameter selection method, 3) location 
filter definition, and 4) minimum vessel speed for consideration. The five optional user 
inputs, each of which has an associated program default value, are: 1) time window 
length, 2) maximum distance between contacts, 3) maximum heading difference between 
contacts, 4) maximum speed difference between contacts, and 5) moving cluster 


threshold value. Both the mandatory and optional user inputs are summarized in Table 3. 


a. Mandatory User Inputs 
(1) Data Source and Type 


The first mandatory user-provided input is the data source 
positional data type. Positional data is provided in either degrees of latitude and longitude 
or in Cartesian coordinates. Positional data in degrees of longitude and latitude require 
conversion to the x-y coordinate grid for STC algorithm analysis, with zero degrees 
longitude and latitude serving as the origin. Conversely, data presented directly in the x-y 
coordinate grid are excused from the data conversion and preprocessing step. Both types 


of data then undergo time indexing in order to create time-step assignments. 


16 
































Table 3. | Summary of mandatory and optional user inputs with default values and 
selection ranges. 

Variable Mandatory/Optional Default Value Selection Range 
Data Source and Synthetic or Real- 
Type Mandatory - World 

Parameter Niaadator : Default Values or 
Selection Method pone Manual Input 
Central Latitude 
Location Filter Riedie - and Longitude, 
Definition y and Width and 
Height of Box 
Mamma yess Mandatory 2 Knots 0 — 30 Knots 
Speed 
Tae neon Optional 12 Minutes 1 — 120 Minutes 
Length 
Maximum ; 
Distance Between Optional 8 Nautical Miles OF) Nauneal 
Miles 
Contacts 
Maximum 
Heading Optional 39° 0° to 359° 
Difference 
ee unrepeee Optional 16 Knots 0 — 30 Knots 
Difference 
Moving Cluster : 
Threshold Value Optional 60% 0% - 100% 

















(2) Parameter Selection Method 


The second mandatory user-provided input required for program 


operability is the parameter selection method. When prompted, the user must decide 


whether to use program default values or to manually input them for STC algorithm 


calculations. The default values are explained in detail as part of each parameter’s 


discussion that follows. Manual user inputs for each parameter have a high degree of 


freedom and are bound only by technical constraints (e.g., a vessel can only have a course 


heading between 0° and 359°). 


17 





(3) Location Filter Definition 


The third mandatory user-provided input defines the location of 
interest for analysis. This input requires four attributes: 1) central latitude, 2) central 
longitude, 3) height, and 4) width. All four parts are defined in tenths of degrees of 
latitude and longitude and form a rectangular shaped bounding box. With this input the 
user can tailor the geographical region of interest for STC analysis. 


(4) Minimum Vessel Speed 


The fourth mandatory user-provided input is the minimum vessel 
speed for the STC algorithm to consider. The default value is two knots, but the user can 
select any speed value from zero to thirty knots. All contacts traveling at or below the 


selected speed are eliminated from consideration by the STC algorithm. 


b. Optional User Inputs 


The five optional user-provided inputs each have an associated default 
value. If the user opts for the manual parameter selection method, the following five 
parameters require specific assignment. 


(1) Time Window Length 


In order to make proper time-step assignments, the STC algorithm 
requires a defined time window length in minutes, as well as a defined start hour and start 
minute. The time window length serves as the minimum amount of time that can exist 
between each individual contact’s position reports. If a contact reports itself more than 
once per time window, a function of the STC algorithm averages the multiple reports into 
one representative contact report. The values available for user selection range from one 
to one hundred and twenty minutes. The default setting is twelve minutes and is derived 
from the six minute rule of navigation, which says that the speed at which a vessel is 


traveling divided by ten is the distance it will travel in six minutes and is defined as 


d=— (7) 
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where d is the distance in nautical miles traveled in six minutes, and s is the speed in 
knots of the vessel [23]. The start hour and start minute are defined by the user as the 
starting time from which to calculate time steps based on time window length. For 
example, if the start hour is 01 and the start minute is 00, and time window length is 
twelve minutes, time-step one will be from 01:00:00-01:11:59 and time-step two from 
01:12:00-01:23:59. 


(2) Maximum Distance Between Contacts 


During the kinematic clustering of contacts there are four factors 
taken into consideration. The first parameter input to kinematic clustering is the maximum 
distance between contacts measured in nautical miles. The value of this parameter is 
subjective as to how far apart two vessels can be and be considered to be paralleling or 
following each other in a manner of interest. Two logical values for this parameter are line- 
of-radar-sight and line-of-visual-sight, which are the distances from which a vessel can be 


detected by radar and visually, respectively, and can be calculated as 


Ratan = 1.23( Yh, + hy } (8) 


h 
R= ! 9 
visual 0.5736 ( ) 





where R is the range in nautical miles, h, and h, are the height of the observer and 


target in feet, respectively [24]. 


When calculated with h, and h, set to the height of the pilothouse of a 


DDG-51 Arleigh Burke Class Destroyer, the values for R and R 


radar visual 


are 
approximately eight and 16 nautical miles, respectively. The default value for maximum 
distance between contacts is set to eight nautical miles. The reason for this is that while 
ships not in visual sight of each other could be exhibiting behavior of interest, illicit 


behavior is more likely to occur when ships are within visual sight. 
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(3) Maximum Heading Difference 


The next parameter input is the maximum heading difference, 
measured in degrees between contacts. When determining the value of maximum heading 
difference to consider, the user subjectively defines what is encompassed in paralleling or 
following behavior. The range of input from which the user can select begins at O° and 
extends to 359°. A key consideration when setting the value for this parameter is the 
distance between the two contacts at the start of the analysis. The default setting for this 
parameter is 39° and is based on the contacts beginning essentially in the same position. 
This value was determined as the angle that is formed for two vessels that are slightly 
more than eight nautical miles apart, the default value for maximum distance between 
contacts, after traveling for sixty minutes, or five time steps, assuming the default value 
for time window length, at a speed of 12 knots. The setup of this is illustrated in Figure 4. 
The values for maximum heading difference when using alternate starting distances 


between contacts us shown in Table 4. 


VJ V1 Contact 2 





Starting 
Point 


8 nm 


Contact 1 


12 nm 


Figure 4. An illustration of the determination of the maximum heading difference of 
two contacts beginning in the same position. 
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Table 4. | Maximum heading difference values for various beginning distances 
between two contacts traveling at 12 knots for sixty minutes. 

















Starting Distance Between Maximum Heading 
Contacts Difference 
0 Nautical Miles 39° 
1 Nautical Mile 34° 
3 Nautical Miles J4° 
5 Nautical Miles 14° 














(4) Maximum Speed Difference 


Similar to the maximum heading difference, the maximum speed 
difference between contacts is a key parameter during the kinematic clustering stage of 
the algorithm. The range of input for maximum speed difference is from zero to thirty 
knots. Selecting a value of zero or near zero indicates that for vessels to be clustered 
together they would need to be traveling at identical speeds. On the other hand, selecting 
a value near thirty, which is considered the higher end of maximum attainable speed of 
ships at sea, would result in this parameter not having a significant role in the clustering 
algorithm. The default for maximum speed difference is set to 16 knots, which was 
determined by considering the following overtaking problem, illustrated in Figure 5. 
Vessel A begins eight nautical miles astern of vessel B and is traveling 16 knots faster. 


After sixty minutes, vessel A is 8 nautical miles ahead of vessel B. 
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Figure 5. An illustration of the overtaking problem to determine the maximum speed 
difference between contacts that begin eight nautical miles apart. 


(5) Moving Cluster Threshold Value 


The moving cluster threshold value that is used to determine 
whether kinematic clusters are stored as moving clusters, or are discarded, must be 
defined. In this selection, the user defines the degree to which the membership of a 
moving cluster can change and still be considered a moving cluster. The sizes of the 
dataset and moving cluster have a large impact on the moving cluster threshold value 
selection. The larger the sizes, the higher the moving cluster threshold value should be. If 
the sizes are small, the moving cluster threshold value should be set to a smaller value to 
allow for better tracking. For example, if the moving cluster contains four contacts, the 
threshold could be set at 50% or below to track the cluster even if two vessels depart. If 
the cluster contains ten contacts, the threshold could be set to 80% to still allow for two 
vessels to depart the cluster. The range of values for selection begins at 0% and extends 
to 100%. A selection of 0% would discount the temporal consideration of the STC 
algorithm, while a selection of 100% would require that a moving cluster not lose any 
members in order to maintain its status as a moving cluster. The default value is 60%, 
which allows a moving cluster of five contacts to lose up to two members and still be 


considered a moving cluster. 


B. DATA CONDITIONING 


Data conditioning is the stage in which the input dataset is formatted and aligned 


for STC algorithm use. First, time indexing, the process of converting UTC seconds to a 
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time-step assignment, an integer greater than or equal to one, is applied to the input 
dataset. UTC seconds are converted to the time and date of the timestamp, and then, with 
the user input for start hour, start minute, and time window length, the time and date is 
converted to a time-step value f. A function designed to ensure that each MMSI or vessel 
identifier has only one position report for each time-step is also applied. If a vessel has 
more than one report in a time-step, the vessel attributes from the multiple reports are 


averaged to form one representative report for the time-step. 


Second, the dataset is formatted and placed in the correct order so that the 
required data fields are properly aligned for STC algorithm use. Each unique AIS vessel 
attribute report is stored as a row in the matrix W in which each column represents a 


vector of different attribute from the AIS data as 


i ee a a (10) 





where © is of size L 


ais X, and L,,, is the number of AIS vessel reports in the input 


dataset. The variables are all vectors of length L 


ays X1 and x and y describe each 


vessel’s position on the x-y coordinate grid, 7 its course, s its speed, m its MMSI, and t¢ 


its time-step assignment. 


C. CLUSTER PREPROCESSING 


Cluster preprocessing is the introduction of vessel identity indexing and the 
decomposition of Y into manageable pieces for STC algorithm analysis. During the 
clustering preprocessing stage, each unique MMSIJ in the vector m is assigned an index 


q beginning at | and continuing until each unique MMST has an assignment. The index 


q will be used as a subscript to identify which vessel’s attributes are being used. 


The W matrix is decomposed by time-step assignment into a three-dimensional 


matrix y of dimension L,x 4x N, where N, represents the maximum number of time 


steps, L, is the number of vessels in each time-step, and He = L,);. The data matrix y 


can be represented at the f -th time-step as 
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y' =| a ae aed i: (11) 





where 
x; yi h 5 
xX, a) i) Sy 
x'= . = Pa | coe en Pe (8 e—— (12) 
Xq Yq l, Sy 


























and q is the vessel identity index. 


D. PROXIMITY FILTERING 


Proximity filtering, accomplished through the use of k-means clustering, 


identifies spatial relationships between vessels at a particular time-step. Only x and y 


position data of y’ is considered. The optimum number of centroids k available for 
assignment is calculated from the length of Y " at the time-step being analyzed as [25] 

L, = 2klog(k). (13) 

To determine the specific centroids at each time-step, the farthest-first 

methodology is used [25]. The x and y position information for the contact in the first 


row of Y is designated as the initial centroid. The second through k-th centroids are 
chosen to be the contact that has the maximum Euclidean distance from the previously 


selected centroid. For example, for t =1 the first contact of y' is designated as the first 


centroid. The second centroid is chosen as the contact has the greatest distance from the 


first centroid, where distance is calculated as 


d= (x,-x,) +(y,-y,) for n= 2, 3,...L,. (14) 


The third centroid is chosen as the point that has the greatest distance d from the second 


centroid and so on. 
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Then the centroids are used to perform k-means clustering on the position data in 


y’ to spatially refine the contacts by assigning each contact to one of the centroids. The 

contacts associated with each centroid j at each time-step f are stored in a matrix C of 

size L, X4 where L, is the number of contacts assigned to each centroid at each time- 
J J 


step as 


=. x’ y' r' s' | forj=1,2,...k, (15) 





where x', y', r', and s' are vectors of size L, X1 containing the respective attributes for 
J 


the vessels in the cluster. After all contacts have been assigned to a centroid, k-means 


cluster assignments for a time-step are stored in a cell array ¢' as 


g =| Gly A hee }. (16) 


A cell array is a set of matrices of different dimensions. At the completion of the 
proximity filtering stage there is a cell array ¢' for each time-step that contains the k- 


means assignment for all contacts in that time-step. 


E. KINEMATIC FILTERING 


Kinematic filtering is used to further refine the clustering results from the 
proximity filter to form clusters of contacts with attributes that exhibit paralleling or 
following behavior as defined by the user. In kinematic filtering, the similarities of course 


and speed attributes of vessels, as well as the distance between them, are determined for 


each k-means cluster. In each cluster c; in &', a seed contact O is determined as the 


vessel with the maximum speed in e to initialize kinematic filtering as 
o =max(s’), (17) 


where s' is a vector of speeds in c’.. To formulate kinematic clusters, three thresholds are 
Pp j 


defined. The heading difference between the seed contact and all other contacts in ci is 


defined as 
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T,, = Ete. for p = 2,3,...L, , (18) 
the speed difference is defined as 
T, = Sq, —Sq,)> (19) 
and the distance between them is defined as 
2 2 
T) = (x, -,,] +(y, -y,,] : (20) 


If T,,,T,, and T,, are less than the user defined thresholds for maximum heading 
difference, maximum speed difference, and maximum distance between contacts, the 


contacts associated with gq, and q, are classified as a kinematic cluster g and are 
represented as 


q 
gj = ; fori 2a, (21) 
P 


where 7 is the number of kinematic clusters for each time-step of é'. The length of @ 


varies and is dependent on the number of contacts that when compared with the seed 


contact result in 7,,,7,, and T,, values that are less than the user defined thresholds. The 
contacts assigned to g from os are removed from further consideration for assignment to 
another kinematic cluster. 

A new seed contact is determined so that T,,,T,, and T,, may be calculated for the 
remaining contacts in C; to discover other kinematic clusters as described in Equations 
(17)-(21). The iterative process continues until all contacts in G, have been considered for 


assignment to a kinematic cluster. Upon completion, a snapshot of the kinematic clusters 


in €' are compiled in a cell array ’ for temporal analysis as 


B=| % Or... G, | (22) 
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An example of a k-means, distance, and velocity kinematic clustering result is 
illustrated in Figure 6 in which there are four kinematic clusters identified at time-step 
two of a generic simulation. The maximum distance between contacts was defined as four 
nautical miles for the simulation, and each of the four resultant clusters are displayed in a 
different color. The contacts of each cluster were initially grouped by proximity filtering 
and further refined kinematically by determining similarities between vessel headings, 


speeds, and the distance between them. 


Poaition in milez from the origin 





“12 -10 -8 -6 4 -2 0 2 4 6 
Position in miles frorn the origin 


Figure 6. An example of k-means, distance, and velocity kinematic clustering results at 
time-step two of a generic simulation. 


F. TEMPORAL REFINEMENT 


Temporal refinement determines which kinematic clusters are moving in time and 
which are not. The input data to this stage are cell arrays that contain the membership of 
the kinematic clusters for each time-step. To determine which kinematic clusters are 
moving in time, it is necessary to calculate the similarity of the membership of the 


kinematic clusters in consecutive time steps. The first kinematic cluster in 4’, @, is 


pe 


designated as the reference cluster, and the intersection of g and each kinematic cluster 


in £'" is calculated as 


1 =length( 9; N9,") for v=1,2,..Lg.- (23) 


t+1 


If an intersection is found between ¢ and a cluster in 4", the variable fuse f is 


calculated as 


I 


~ length( ¢) = 


and the next cluster in £", ¢/,,, becomes the reference cluster. 


If an intersection is not found between @ and a cluster in 8", the intersection is 
calculated between g/ and each kinematic cluster in 6° as in Equation (23). If an 
intersection is found, fuse is calculated as in Equation (24). If an intersection is not found, 
g; is removed from consideration as a moving cluster and the second cluster in f, ¢,, 


becomes the reference cluster, and the process is repeated. 


Once all kinematic clusters in ’ have served as the reference cluster for 


intersection calculations, the kinematic clusters of #’"' beginning with g/*' become the 


1 


reference clusters. This iterative process continues until all the clusters in pes have 


‘< . ie t ~ - 
served as the reference cluster. Kinematic clusters in ee are not considered in 


temporal refinement due to the lack of future kinematic clusters with which to calculate 


intersections. 


When a value other than zero is calculated for f, it is compared to the user- 
defined moving cluster threshold value. If f is greater than or equal to that value, the 


associated kinematic cluster gj is stored to the cell array 0 as 


d={¢}, (25) 
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where 6 is of size 1x L,,. and L,, is the number of moving clusters determined in 


temporal refinement. 


G. POST-PROCESSING OF SPATIOTEMPORAL RESULTS 


Upon completion of temporal analysis, three cell arrays, each containing 
spatiotemporal results, are constructed. Through post-processing of these cell arrays, it is 
possible to build usable outputs to determine information about the selected water space, 
including the detection of moving clusters of vessels that are exhibiting paralleling or 


following behavior and the ability to track vessel interactions over time. 


1. Identifying Members of a Moving Cluster 


The first cell array 6 represents all kinematic clusters that move through time for 
a minimum of two time steps with a minimum threshold of continuous membership in the 


variable fuse f as 


% Oa 


t 


sal fn Hes the O 


Q Dir Pra (26) 


where Z is the maximum time-step that contains a moving cluster. The row location of 
any cluster @ in the cell array O indicates the time steps at which the moving cluster 
exists. For example @,, that appears in the second row, is a moving cluster that begins at 
time-step two. 

Further analysis of this cell array provides details such as when a moving cluster 
forms, which contacts are members of the subject moving cluster, if contacts join or leave 


the cluster, and when, if ever, the cluster’s membership becomes low enough that it can 


no longer be considered a moving cluster. 
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2 Kinematic Clusters Occurring at the Last Time-step of Data 


The cell array is equal to £" at the last time-step of available data and is of 
size 1x L, where L, is the number of kinematic clusters found in /’. Due to a lack of 


future time steps with which to compare this time-step of data, these current clusters are 


stored as a matter of interest to the user as 


(27) 


The clusters in are not known to be moving clusters but represent only the kinematic 
clusters of the last time-step of data. Information detailed in is displayed only in the 


text output, under the “Moving Clusters” section, to avoid unnecessary cluttering of the 


visual output. 


3. Generation of the Contacts of Interest List 


The third cell array, 9, is the same size as O, but rather than containing cluster 
membership information, it contains the global identity index g of contacts that either 


join or leave clusters. Vessels that move together as a moving cluster for several time 
steps are of interest to an analyst, but a vessel that joins a cluster then departs it, and then 
joins another cluster, is also of interest. The information contained in @ is displayed only 


in the text output, under the “Contacts of Interest” section. 


4. Post-processing of Spatiotemporal Results to Form Usable Outputs 


The post-processing of the spatiotemporal results stored in the cell arrays 6, p, 


and @ is the transfer of data into usable outputs for better user understanding. Post- 
processing of the cell arrays results in the generation of two outputs. One output is text 


based and the other is a visual representation of moving vessel clusters in the water space. 


a. Text Output 


The text output has two distinct parts, Moving Clusters and Contacts of 
Interest. The Moving Clusters part provides the global identifiers of the members of each 


moving cluster, as well as the time at which the cluster is formed and the time at which 
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the cluster ceases to exist. Current clusters are also detailed in this manner in the text 


output. The second part of the text output, Contacts of Interest, results from the p cell 


array and contains the global identifier of any contact that joins or leaves an existent 
moving cluster, as well as the time at which the event occurs. If the contact is deemed to 
be leaving a moving cluster, the text output also details the heading and speed of the 
contact’s departure. If there are two or more contacts that depart a larger moving cluster 
and continue to move as a smaller moving cluster, then the heading and speed reported is 
an average of the two contacts. An example of both parts of the text output is presented in 
Figure 7 where there are fourteen moving clusters identified and three contacts of interest 


defined. 


b. Visual Output 


The visual based output is an interactive, MATLAB-generated plot that 
depicts the position and track of each moving cluster analyzed by the STC algorithm. In 
Figure 8, there are several moving clusters displayed, some red, some orange, and some 
green, each color indicating the membership of the moving clusters as defined in Table 5. 
The position, heading, and speed of each moving cluster displayed is computed by 
averaging its constituent members’ respective attributes. For example, for a given moving 
cluster, the speed reported is the average speed of all vessels contained within that 
specific cluster. Moving clusters are marked by an ‘x’ for each time-step that they occur 
and with an ‘o’ at their final time-step of occurrence. If a moving cluster is only present 
for two time steps, it is represented by an ‘o’. The markers and track lines on the plot are 


color-coded to detail the degree of membership that the moving cluster maintains. 
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Moving Clusters: 


Cluster 1 containing contacts 18 19 20 21 22 begins at time 1 and ends at time 1 

Cluster 2 containing contacts 12 13 14 17 begins at time 2, is a current cluster, but gains or loses members 
Cluster 3 containing contacts 18 19 22 begins at time 2 and ends at time 4 

Cluster 4 containing contacts 25 30 begins at time 2 and ends at time 2 

Cluster 5 containing contacts 6 20 21 begins at time 2 and ends at time 3 

Cluster 6 containing contacts 10 12 13 14 17 begins at time 3 and ends at time 3 

Cluster 7 containing contacts 8 27 begins at time 3 and ends at time 3 

Cluster 8 containing contacts 15 16 23 begins at time 3 and ends at time 3 

Cluster 9 containing contacts 12 13 14 begins at time 4 and ends at time 4 

Cluster 10 containing contacts 15 16 begins at time 4, is a current cluster, but gains or loses members 
Cluster 11 containing contacts 3 12 13 14 17 begins at time 5 and ends at time 5 

Cluster 12 containing contacts 20 21 begins at time 6 and is a current cluster 

Cluster 13 containing contacts 18 19 begins at time 6 and is a current cluster 

Cluster 14 containing contacts 5 15 16 begins at time 6 and ends at time 6 


Contacts of Interest: 


Contact 5 joins a cluster at time 6 and remains with the cluster until it departs at time 6. The contact departs on average 
course 3 at average speed of 27 knots. 

Contact 20 joins a cluster at time 1, departs the cluster, and then rejoins it, finally departing at time 6. The contact 
departs on average course 270 at average speed of 20 knots. 

Contact 20 joins a cluster at time 1, departs the cluster, and then rejoins it, finally departing at time 6. The contact 
departs on average course 270 at average speed of 20 knots. 





Figure 7. An example text output from the STC algorithm’s analysis of a generic simulation scenario. 
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—— 100% 

—— 75-9936 
——— 50-7436; 
—— 25-4936 





+] 
o 
= 
Type: lineseries 
Cluster 4 
Time 2 
Average Course: 157 
Average Speed: 23 
Figure 8. An example visual representation output of the STC algorithm’s analysis of a 


generic simulation scenario. 


Table 5. ©The MATLAB interactive plot color-code key for moving cluster 
membership identification. 




















Display Color Percentage Membership Indicated 
Red 100% 
Orange 75% - 99% 
Green 50% - 74% 
Blue 25% - 49% 
Violet 0% - 24% 














33 


The MATLAB plot also has an interactive functionality that is illustrated in 
Figure 8. The user is able to click on any marker of a moving cluster to determine which 
cluster is represented, at which time-step the cluster occurs, as well as the cluster’s 


position, heading, and speed. In Figure 8 it is determined that moving cluster 4 exists at 
time-step two, in position (—8.67,—1.33) heading 157° at 23 knots. To discover which 


contacts compose moving cluster 4, it is necessary to reference the first portion of the text 
output. Upon doing so, it can be determined that the contacts in moving cluster 4 have 


vessel identifiers of 25 and 30. 


In this chapter, the six stages of the STC algorithm were detailed. Vessel data and 
user-provided input were discussed as the primary inputs to the STC algorithm. The data 
conditioning and cluster preprocessing stages, where datasets are time indexed, 
formatted, and aligned for algorithm use, were discussed. Proximity and kinematic 
filtering were defined and the attributes each considers were given. Temporal refinement 
to determine which clusters move through time was detailed, and examples of the usable 
outputs formed from post-processing of spatiotemporal results were presented. In Chapter 
IV, the STC algorithm is verified against a synthetic dataset and is validated using two 


real-world datasets. 
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IV. IMPLEMENTATION AND RESULTS 


The STC algorithm presented in Chapter III is implemented, verified, and 
validated in this chapter. The algorithm is implemented in the MATLAB programming 
environment. To verify operability and validate its possible use, the STC algorithm is 
tested using synthetic and real-world datasets. An overview of the MATLAB functional 
code and user environment is provided, followed by a detailed explanation of the 


synthetic and real-world scenarios and results. 


A. MATLAB IMPLEMENTATION MODEL 


The MATLAB programming code is organized into a top-level function, Thesis, 
which calls various functions to execute the STC algorithm based on user input. An 
overview of the highest-level MATLAB functions and their purposes is found in Table 6. 


The detailed code for these and other functions is included in the appendix. 


1. MATLAB Graphical User Interface 


The MATLAB user interface for operation of the STC algorithm was designed to 
be intuitive and to ensure ease of use. The algorithm’s user-provided inputs discussed in 
Chapter III are provided to the algorithm via the MATLAB graphical user interface 
(GUI) tool, examples of which are illustrated in Figures 9 and 10. 


2. Simulation and Real-world Analysis Parameters 


The thresholds used for the simulation and real-world analyses are defined in 


Table 7 and are the default values as discussed in Chapter III. 
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Table 6. 


MATLAB top-level functional organization of the STC algorithm. 





Function 


Overview 





Thesis 


Thesis is the function call to access spatiotemporal algorithms that 
support maritime domain awareness. The user will select whether their 
input is “Synthetic” or “Real-world,” and the program will determine 
which set of functions to use for analysis. 





RealWorld 


RealWorld is the top-level function call for real-world AIS data analysis. 
This function begins with data pre-processing and filtering, and includes 
all functions required for spatiotemporal analysis of the water space. 





KDVC 


KDVC(A,clusters,time,climit,dlimit,spdlimit) returns the kinematic 
clusters at a given time-step based on user inputs where A is the input 
dataset, clusters is the matrix for output storage, time is the time-step 
being evaluated, climit is the user defined heading difference between 

contacts in degrees, dlimit is the user defined distance between contacts 
in nautical miles, and spdlimit is the user defined speed difference 
between contacts in knots. 





Kinematic 


Kinematic(A,c,climit,dlimit,spdlimit,h) returns the kinematic clusters at a 
given time-step based on user inputs where A is the input dataset, c is the 
seed contact, climit is the user defined heading difference between 
contacts in degrees, dlimit is the user defined distance between contacts 
in nautical miles, spdlimit is the user defined speed difference between 
contacts in knots, and h is the time-step. 








aA” SliderFilt 





Maximum Course Difference (Degrees) Maximum Speed Difference (Knots) 





© <\> 86 


4 > 
0 30 


Maximum Distance Between Contacts (Miles) Minimum Allowable Speed (Knots) 





4 > 
30 


Continue 





Figure 9. The MATLAB graphical user interface with slider and open text parameter 


input options. 
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600 


Choose the Source Data File 


MENU 


| MDA_DataSett.xisx 





RuntV1.xisx 


| Runt V2.xisx 





RuniV3.xisx 





| RuntV4.xisx 





Figure 10. The MATLAB graphical user interface with push-button input. 























Table 7. Synthetic and real-world analyses model parameters. 
Parenece Value for Synthetic and Value for GPS Real- 
AIS Real-World Scenarios World Scenario 
Time Window Length 12 Minutes 1 Minute 
Maximum Heading 
Difference Between 39° 10° 
Contacts 
ee 8 Nautical Miles 0.5 Nautical Miles 
Between Contacts 
Maximum Speed Difference 16 Knots 5 Knots 
Between Contacts 
Moving Cluster Threshold 60% 15% 


Value 











B. VERIFICATION USING SYNTHETIC DATA 


The simulation is run using data that was purposefully created to test the 
functionality of the STC algorithm. Specifically, the synthetic scenario provides proof-of- 
concept that the algorithm can properly evaluate the user-selected values from Table 7 
and that the text output and the interactive aspect of the visual output are operable. The 
data simulation involves a dataset of three hundred vessel position reports taken over a 


period of seventy-two minutes, or six time steps. The setup and results of the simulation 


are presented below. 


a7 





1. 


Simulation Setup 


In the simulation, two separate scenarios are executed. Scenario one, which 


occurs on the top half of Figures 11-16, consists of two contacts and is designed to test 


maximum heading difference between contacts and maximum distance between contacts. 


Scenario two, which occurs on the lower half of Figures 11-16, begins with two groups 


of five contacts each, and is designed to test maximum speed difference between the 


contacts, maximum distance between contacts, and the operability of the moving cluster 


threshold value. The narrative steps for both scenarios, as well as the expectation for 


kinematic clustering, are detailed in Table 8. 























Table 8. | Summary of the vessel interactions for the simulation scenarios. 
Time- : , ; ; 
Sgt Scenario One Expectations Scenario Two | Expectations 
Distance: 9 nm Distance: 9 nm 
1 Heading Dif: 38° No Cluster Speed Diff: 17kts 2 Clusters 
Distance: 7.8 nm Distance: 7.3 nm 
2 Heatine Dit 8" Cluster Speed Diff: 17kts 2 Clusters 
Distance: 6.6 nm Distance: 5.6 nm 
3 Heading Diff. 0° Cluster Speed Diff: 15kts 1 Cluster 
Distance: 6.6 nm Distance: 7.1 nm 
4 Headine Diff: O° Cluster Speed Diff: 15kts 1 Cluster 
: Distance: 5.6 nm 
Dist : 6.6 
5 = pe ff. ae No Cluster | Speed Diff: 15kts 1 Cluster 
Heading Dut: Lose 4 contacts 
. Distance: 4.1 nm 
Dist : 7. 
6 - sera ; : oy No Cluster | Speed Diff: 15kts 1 Cluster 
Heading Diff: 40 
Lose 3 contacts 




















In scenario one, the contacts begin nine nautical miles apart with a course heading 


difference of 38° at time-step one. The contacts are traveling at a constant speed of 18 


knots and at time-step two are 7.8 nautical miles apart with the same course heading 


difference. At this point, because they now fall inside the eight nautical mile threshold 


and have less than a 39° heading difference, the two contacts should be classified as a 
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kinematic cluster. At time-step three, the contacts are 6.6 nautical miles apart and have 


both turned to course 090. With a O° course heading difference, the contacts should 
again be stored as a kinematic cluster. The contacts maintain course and speed through 


time-step four, and should, therefore, be kinematically clustered. At time-step five, the 


contacts turn on outbound courses with a heading difference of 40°. Although they are 
still only 6.6 nautical miles apart, the contacts’ course heading difference has now 
exceeded the defined threshold, and they should no longer be considered a kinematic 
cluster. They continue on their outbound courses at time-step six and should not be 
kinematically clustered. The contacts in scenario one should be considered a moving 


cluster beginning at time-step 2 and ending at time-step four. 


In scenario two, two groups of five contacts each begin nine nautical miles apart 
traveling on identical courses with a speed difference between the groups of 17 knots. 
Due to the speed difference and distance between them, the groups should be classified as 
two separate kinematic clusters, as the five contacts in each group are traveling at the 
same speed. At time-step two, the two groups are inside the eight nautical mile threshold, 
but still maintain a 17-knot speed difference and, therefore, should be clustered 
separately. At this point, each of the groups should be classified as a moving cluster that 
begins at time-step one. The two groups of contacts shift speed at time-step three and 
settle to a 15-knot speed difference. Now, inside the speed difference threshold of 16 
knots, the two groups should be kinematically clustered as one group of ten contacts. At 
time-step four, the contacts shift speed again but maintain their speed difference at 15 
knots. They should again be clustered as one group of ten contacts and now considered a 
moving cluster that begins at time-step three. At time-step five, the contacts maintain 
their speed, but four of the ten contacts fail to transmit a position report in order to test 
the moving cluster threshold value of 60%. The remaining contacts should continue to be 
kinematically clustered, and because 60% of the contacts of the moving cluster remain, 
they should be considered a moving cluster. At time-step six, the contacts maintain speed, 
but three of the remaining six contacts do not have associated position reports. The 
remaining three contacts should be kinematically clustered, but because only 50% of the 


six contacts remain, the moving cluster should be reported to end at time-step five. 
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Zi Kinematic Clustering Results 


The simulation scenarios have been described in detail above, and expectations for 
kinematic clustering and moving cluster analysis have been set. In the discussion to follow, 


the kinematic results at each time-step and the spatiotemporal outcome will be detailed. 


The visual representation of the kinematic clusters at time-step one is presented in 
Figure 11. As expected, the only kinematic clusters displayed are those from scenario 
two, each consisting of five contacts, colored cyan and green and labeled c, and c,, 
respectively, in Figure 11. The black dots represent other contacts in the input dataset that 


are not kinematically clustered in any cluster. 


Position in milez from the origin 





Position in miles fromthe origin 


Figure 11. The kinematic clustering results for time-step one, which reveal two clusters, 
each containing five contacts, are in accordance with the expected outcome. 


The kinematic clustering results for time-step two are shown in Figure 12. In 


addition to the kinematic clusters for scenario two, colored green and cyan and labeled c, 
and c,, respectively, the two contacts from scenario one are now kinematically clustered, 


colored red, and labeled c,. 
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Poaition in milez from the origin 





Position in miles frorn the origin 


Figure 12. The kinematic clustering results for time-step two, which reveal three clusters, 
cyan and green each containing five contacts, and red with two contacts, are 
in accordance with the expected outcome. 


At time-step three, the expectation was for the kinematic cluster from scenario 
one to continue, but for the two kinematic clusters from scenario two to be combined into 
one. The results, illustrated in Figure 13, are as expected. The scenario one cluster is 


colored purple and labeled c,, and the new larger kinematic cluster from scenario two is 


colored cyan and labeled c,. 


The visual representation of the kinematic clusters at time-step four is presented 
in Figure 14. In time-step four, all contacts maintained course and speed and were 


expected to be kinematically clustered as in time-step three. 
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Poaition in milez from the ongin 





Position in miles fromm the origin 


Figure 13. The kinematic clustering results for time-step three contain two clusters: 
green, which is the combination of the two five-contact clusters from time- 
step two, and red, with two contacts. The clusters are as expected. 


Poaition in milez from the origin 





Position in miles frorn the origin 


Figure 14. The kinematic clustering results for time-step four, which reveal the same two 
clusters as in time-step three, are as expected. 
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At time-step five, we do not expect the contacts in scenario one to be 
kinematically clustered due to exceeding the maximum heading difference threshold. As 
illustrated in Figure 15, the cluster from scenario one no longer appears. Four contacts 
from scenario two did not transmit position reports, but the remaining six should still be 
kinematically clustered together. In Figure 15, the cluster from scenario two is colored 


red and labeled c,. 


The contacts in scenario one continue to exceed the maximum heading difference 
threshold at time-step six and should not be kinematically clustered together. Of the 
remaining six contacts in scenario two, three fail to transmit a position report. The 
remaining three contacts should still be kinematically clustered and are illustrated in red 


and labeled c, in Figure 16. 


Position in milez from the origin 





Position in miles fromthe origin 


Figure 15. The kinematic clustering result for time-step five contains one cluster, which 
contains the remaining six contacts from the larger ten-contact cluster in 
time-step four, is tracked as expected because of the moving cluster threshold 
value definition of 60%. 
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Poaition in miles from the origin 





Position in miles fromthe origin 


Figure 16. The kinematic clustering result for time-step six contains one cluster, which 
contains the remaining three contacts from the larger six-contact cluster in 
time-step five, is tracked as expected as a kinematic cluster. 


3. Spatiotemporal Clustering Results 


The kinematic clustering results were all as expected from the description of the 
synthetic simulation scenarios. The STC text results are given in Figure 17. As expected, 
the two groups of contacts in scenario two are considered moving clusters beginning at 
time-step one and ending at time-step two when the two combine to form one larger 
cluster. The memberships of the clusters are detailed as cluster 1 and cluster 2. In Figure 
17, cluster 3 contains the membership of the contacts in scenario one. They begin as a 
moving cluster at time-step two and end at time-step four, as expected. Cluster 4, which 
is the combination of clusters 1 and 2, begins at time-step three and ends at time-step 
four, because of the departure of four contacts in time-step five. Cluster five, the six 
contacts that remain from the large group, is described to begin and end at time-step five. 
This occurs because the group is still considered a moving cluster from time-step four but 


does not continue into time-step six due to the loss of three of the remaining six contacts. 
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Moving Clusters: 


Cluster 1 containing contacts 10 11 12 13 14 begins at time | and ends at time 2 

Cluster 2 containing contacts 15 16 17 18 19 begins at time | and ends at time 2 

Cluster 3 containing contacts 1 2 begins at time 2 and ends at time 4 

Cluster 4 containing contacts 10 11 12 13 14 15 16 17 18 19 begins at time 3 and ends at time 4 
Cluster 5 containing contacts 10 11 12 17 18 19 begins at time 5 and ends at time 5 


Contacts of Interest: 


Contacts 10 11 12 13 14 are a moving cluster that join and depart other larger clusters. 

Contact 13 14 15 16 joins a cluster at time 1, departs the cluster, and then rejoins it, finally departing at time 4. 
The contact departs on average course 253 at average speed of 16 knots. 

Contact 12 17 18 joins a cluster at time 1, departs the cluster, and then rejoins it, finally departing at time 5. The 
contact departs on average course 236 at average speed of 12 knots. 

Contact 1 2 joins a cluster at time 2 and remains with the cluster until it departs at time 4. The contact departs on 
average course 68 at average speed of 14 knots. 

Contact 1 2 joins a cluster at time 2 and remains with the cluster until it departs at time 4. The contact departs on 
average course 68 at average speed of 14 knots. 

Contacts 15 16 17 18 19 are a moving cluster that join and depart other larger clusters. 





Figure 17. The simulation scenario STC text output detailing five moving clusters and multiple contacts of interest. 
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The visual representation of the STC results is given in Figure 18. According to 
the text output in Figure 17, there should be five moving clusters displayed. Upon 
inspection of Figure 18 it is found that there are indeed five clusters, four are red lines. 
The fifth moving cluster is a single cyan mark on the plot immediately above the lower- 
center moving cluster. The moving clusters have been enlarged in Figures 19 and 20 in 


order to further investigate each moving cluster. 


Cluster 3, which contains contacts 1 and 2, has been enlarged in Figure 19. The 
interactive display of MATLAB has been used to select the middle clustering point, and 
it is shown that the point is a point in Cluster 3 at time-step three. The line and all marks 
on it are in red, indicating that the moving cluster maintains 100% membership during its 
life, as expected. The straight-line nature of the moving cluster track in Figure 19 is 


created by averaging the position, course and speed of contacts 1 and 2. 





—— 100% 
—— 75-99% 
——— 50-7498; 
—— 25-49% 
—— 0-25% 

End of Cluster 


Milez From the Crigin 





Miles From the Origin 


Figure 18. A broad view of the visual representation of the five moving clusters 
determined in the synthetic scenario, which is as expected. 
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Figure 19. An enlarged view of the visual representation of moving cluster 3, which 
contains contacts | and 2 in the synthetic scenario, exists between time-steps 
2 and 4, and is as expected. 


The other four moving clusters have been enlarged in Figure 20. The lines on the 
left and right correspond to Clusters 1 and 2 in Figure 17. Through the interactive feature 
of the plot it can be seen that both clusters begin at time-step one and end at time-step 
two when they combine into one larger moving cluster, represented by the center red line 
in Figure 20. Both clusters have red lines and marks associated with them, indicating 
100% membership is maintained for both time steps of their lives. The middle cluster is 
representative of Cluster 4 from Figure 17, which is the combination of the two smaller 
moving clusters. Although one mark of the cluster is hidden by the MATLAB plot 
marker, all marks and the line of the cluster are red, indicating 100% membership during 
its two-time-step life, which begins at time-step three and ends at time-step four. The 
MATLAB interactive plot marker is marking Cluster 5 from Figure 17, which begins and 
ends at time-step five after four of the ten contacts depart the large moving cluster. Under 
the MATLAB interactive marker the mark is cyan in color, indicating the end of the 


moving cluster’s life. 
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Figure 20. An enlarged view of the visual representation of clusters 1, 2, 4, and 5 in the 
synthetic scenario. Clusters 1 and 2 combine at time-step 3 to form cluster 4, 
and cluster 5 is tracked during time-step five as the remaining six contacts of 

cluster 4. 


Ce VALIDATION USING REAL-WORLD DATA 


After successfully evaluating the synthetic data scenarios, the STC algorithm was 
tested on two real-world datasets. For the first analysis, the real-world data source is 
worldwide AIS vessel position reports for the twenty-four hour period beginning at 0001, 
10 January 2012. The water space of interest for the analysis was chosen to be the Strait 
of Malacca. The Strait of Malacca, which is the main shipping channel between the 
Indian Ocean and the Pacific Ocean, is considered one of the most important shipping 


lanes in the world, with more than 60,000 vessels passing through it each year [26]. 


The purpose of the real-world data analysis is to illustrate the STC algorithm’s 
ability to process a robust and relatively large set of data. On 10 January 2012 there were 
slightly more than 1.5 million worldwide AIS vessel position reports, approximately five 


thousand of which occurred in and around the Strait of Malacca. The complete AIS 
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position report picture is shown in Figure 21. Each circle on the plot represents an 


individual AIS position report during the time period of interest. 
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Figure 21. Visual representation of the Strait of Malacca AIS position report dataset for 
10 January 2012. 


if AIS Real-world Data Concerning the Strait of Malacca 


To set up the scenario, we present a hypothetical report of intelligence that the 
motor vessel (M/V) Mairini, with MMSI number 538003897 has conducted an illicit 
transfer of cargo or persons on 10 January 2012 in or around the Strait of Malacca. 
Theater strategic decision makers need to determine if M/V Mairini interacted with or 
traveled with any other ships for any period of time during the day in question. The 
worldwide AIS dataset for 10 January is provided to the STC algorithm, and the results 


are detailed below. 
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2. AIS Real-world Data Results 


Twelve moving clusters are reported as a result of the STC algorithm analysis 
using the threshold values defined in Table 7. The text output results of the Strait of 
Malacca analysis are presented in Figure 22. Through examination of the text output, we 
can determine that the VOI, MMSI 538003897, appears to interact with another vessel on 
the day in question in the Strait of Malacca. By converting the time steps to time of day, 
we can determine that M/V Mairini began interacting with another vessel around 1936 
UTC on 10 January 2012. The STC algorithm text output reports that the VOI and the 
vessel it is clustered with, MMSI 538002853, are a current cluster as well. The latest 
time-step of data that contains kinematic clusters is time-step 116, which converts to 


2312 UTC. An overview of the visual output is presented in Figure 23. 


The underlay of the Google map on the visual output provides situational 
awareness to the user. The MATLAB interactive capability that was highlighted in the 
synthetic simulation provides further insight into the situation. The moving cluster in 
question, cluster 9, has been selected in Figure 23, and its attributes are displayed. The 
cluster was moving on an average course of 126° at a speed of 13 knots at time-step 98, 
and when the two vessels are last clustered at time-step 116, they are moving on an 
average course of 114° at a speed of 14 knots. From the visual representation in Figure 
23, it appears that the moving cluster is on a standard transit course through the Strait of 
Malacca. While the results of the real-world data analysis are not conclusive evidence 
that M/V Mairini was involved in illicit behavior, they do offer enough insight as to the 


vessel’s interactions on the day in question for an analyst’s further scrutiny. 
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Moving Clusters: 


Cluster 1 containing contacts 477093700 477583000 477759800 begins at time 19 and ends at time 19 
Cluster 2 containing contacts 533001330 533281000 begins at time 20 and ends at time 20 
Cluster 3 containing contacts 353525000 355461000 begins at time 87 and ends at time 88 
Cluster 4 containing contacts 352645000 356362000 begins at time 87 and ends at time 87 
Cluster 5 containing contacts 477583000 477759800 begins at time 88 and ends at time 89 


Cluster 6 containing contacts 477621300 477961700 begins at time 88 and ends at time 97 

Cluster 7 containing contacts 416031500 416031800 begins at time 88 and ends at time 88 

Cluster 8 containing contacts 477583000 477946000 begins at time 96 and ends at time 96 

Cluster 9 containing contacts 538002853 538003897 begins at time 98, is a current cluster, but gains or 
loses members 





Figure 22. The STC algorithm text output for the AIS real-world data analysis of the Strait of Malacca on 10 January 2012. 
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Average Speed: 13 


Figure 23. STC algorithm visual output for a AIS real-world data analysis of the Strait of 
Malacca on 10 January 2012. Moving cluster 9 containing two contacts 
travels on average heading 126 at an average speed of 13 knots. 


3 GNSS Real-world Data Concerning Vehicles Imitating Small Boats 


The second real-world data analysis is performed on GPS data gathered over a 
twenty-three minute period from 1514 to 1537 UTC on 24 April 2013 via GNSS 
transmitters mounted on six vehicles driving on Route 3 in Massachusetts. The vehicles 
operate in a manner to simulate small boats traveling together. The drive-plan for the 
vehicles dictated three of the them to form a convoy and maintain speed and distance for 
the span of the exercise. The other three vehicles acted as confuser vehicles and moved in 
and out of the convoy. One of the convoy vehicles was unable to report heading data and 
was, therefore, excluded from STC algorithm analysis. The analysis from the STC 
algorithm reveals that multiple moving clusters are tracked over the twenty-three minute 


period, which is in keeping with the drive-plan, which is summarized in Table 9. 


S2 


Table 9. | GNSS real-world data analysis vehicle drive-plan. 
































Vehicle Drive Plan 
1 Confuser: drive alongside/lead/trail/enter/exit convoy, pass other 
confusers 

2 Confuser: drive alongside/lead/trail/enter/exit convoy 

3 Convoy lead: drive at/near speed limit in right lane 

4 Convoy middle: do not change convoy order, vary follow spacing, 
allow confusers to enter/exit convoy 

5 Confuser: drive alongside/lead/trail/enter/exit convoy 

6 Convoy tail: do not change convoy order, vary follow spacing, 
allow confusers to enter/exit convoy 





4. GNSS Real-world Data Results 


According to the drive-plan, vehicles 3, 4, and 6 form the convoy and should 
regularly be clustered as a moving cluster over the span of the twenty-three minute 
dataset. Vehicle 3 did not transmit heading data, so it was not analyzed by the STC 
algorithm. The remaining vehicles should move in and out of the convoy and form 
various moving clusters during the run. From the visual representation in Figure 24 and 
the text output in Figure 25, it is evident that several moving clusters are tracked in the 
analysis of the dataset. Vehicles 4 and 6, which form the convoy, are not explicitly 
clustered together for the length of the analysis but are detailed in the “Contacts of 
Interest” section as a moving cluster that joins and departs other clusters. This 
designation provides evidence that contacts 4 and 6 move together for much of the 
twenty-three minute run as directed in the drive-plan. With user thresholds set as 
previously detailed, thirteen moving clusters are identified during the analysis. The 
confuser vehicles moving in and out of the convoy account for a majority of these cluster 


changes. In Figure 24 moving cluster 9 consisting of vehicles 1, 2, and 6 is identified. 
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Figure 24. STC algorithm visual output for a GNSS real-world data analysis of vehicles 
imitating small boats on 24 April 2013. 


In summary, the implementation of the STC algorithm in the MATLAB 
programming environment was detailed in this chapter. A synthetic scenario was used to 
verify the algorithm’s filtering and clustering logic and user-provided threshold limits. 
Two real-world datasets were used to validate the STC algorithm’s performance. The 
first used AIS data and a hypothetical scenario to analyze the interactions of a 
commercial VOI. The second used GNSS data consisting of vehicles simulating small 


boats operating at higher speeds and closer distances than the commercial AIS scenario. 
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Moving Clusters: 


Cluster 1 containing contacts 5 6 begins at time 5 and ends at time 24 
Cluster 2 containing contacts | 2 begins at time 5, is a current cluster, but gains or loses members 
Cluster 3 containing contacts 4 5 6 begins at time 8 and ends at time 13 
Cluster 4 containing contacts 1 5 begins at time 9 and ends at time 12 
Cluster 5 containing contacts 4 6 begins at time 12 and ends at time 12 
Cluster 6 containing contacts 2 4 begins at time 14 and ends at time 14 
Cluster 7 containing contacts 4 5 begins at time 16 and ends at time 26 
Cluster 8 containing contacts 1 4 5 begins at time 19 and ends at time 19 
Cluster 9 containing contacts | 2 6 begins at time 22 and ends at time 22 
Cluster 10 containing contacts 2 6 begins at time 23 and ends at time 23 
Cluster 11 containing contacts 1 4 begins at time 24 and ends at time 24 
Cluster 12 containing contacts 1 2 5 begins at time 25 and ends at time 25 
Cluster 13 containing contacts 1 2 4 begins at time 27 and ends at time 27 


Contacts of Interest: 


Contacts 4 6 are a moving cluster that join and depart other larger clusters. 

Contact 5 joins a cluster at time 5, departs the cluster, and then rejoins it, finally departing at time 26. The contact 
departs on average course 249 at average speed of 25 knots. 

Contact 2 joins a cluster at time 5, departs the cluster, and then rejoins it, finally departing at time 27. The contact 
departs on average course 297 at average speed of 5 knots. 





Figure 25. STC algorithm text output for GNSS real-world data analysis of vehicles imitating small boats on 24 April 2013. 
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V. CONCLUSIONS 


The focus of this thesis was to develop a STC algorithm to aid in the Navy’s 
mission of MDA. Specifically, the STC algorithm was designed to autonomously analyze 
vessel interactions to gain behavioral knowledge of a water space. An extension of the 
urban combat environment direction and displacement directivity algorithms was 
proposed as a method of classifying paralleling and following movement which may be 


exhibited by vessels involved in illicit activity. 


The general framework of STC in this thesis was to form clusters kinematically 
and then refine them temporally. Clusters were formed through proximity filtering with 
k-means clustering, kinematically refined by determination of the similarity of vessel 
course, speed, and distance between vessels, and further refined through temporal 
analysis. This methodology was deemed to effectively provide a refined kinematic 
clustering result that carefully balanced user input with mathematical reasoning. The 
kinematic clusters were then temporally analyzed to determine which clusters were 
moving in time and which were not. Spatiotemporal results were then processed to 
produce usable text and visual outputs that result in better understanding of vessel 


interactions and behavioral insight in a water space. 


A. SIGNIFICANT RESULTS 


The work presented in this thesis provides three contributions to the MDA 
problem set. First an STC scheme was developed and applied to the maritime domain to 
identify interactions between vessels at sea in order to gain behavioral knowledge of 
paralleling and following movement in a water space. The scheme was validated using 
multiple data sources that provided a real-world hypothetical scenario in the Strait of 


Malacca and a simulated movement of small boats. 


Second, three filters were designed in support of the operation of the STC 
algorithm. Location filtering enabled a large input dataset to be focused onto a specific 
water space of interest. The proximity filter spatially refined the input dataset by 
grouping contacts via k-means clustering centroid assignment. Kinematic filtering then 


of 


determined the similarity of vessel course and speed within each k-means cluster and 
further spatially refined the clustering process by comparing the distance between 


vessels. 


Finally, the STC algorithm provided those interested in national or theater 
security decision making the opportunity to quantify the (membership) continuity of a 
moving cluster. A group of vessels deemed to be exhibiting paralleling or following 
behavior by the algorithm were tracked until the continuity fell below the user-defined 
threshold of interest. The opportunity to quantify the continuity allows the user to 


characterize the degree to which a moving cluster may change. 


B. FUTURE WORK 


In this thesis the focus of the STC algorithm was on a water space of interest. Use 
of the VOI identifier as input to determine the water space of interest during the time of 
interest would improve the algorithm’s performance. Future work should extend the 


algorithm to include VOI identity if available. 


An extension of this work would be to focus on countering the knowledgeable 
enemy. In this thesis the assumption was made that vessels will not attempt to prevent 
detection of their behaviors. If, for instance, vessels rendezvoused with other vessels in a 
way that inhibited the algorithm from detection as a moving cluster, the advantage 
provided by the STC algorithm would be nullified. Perhaps the vessels would meet via a 
head-on scenario or would steer headings that were always greater than 90° in difference 
in order to avoid detection. Expanding the current algorithm to detect a head-on meeting 


scenario would improve its performance. 


The STC algorithm presented in this thesis does not provide track estimation 
functions. In future work, when a position report is missing or unreadable, the vessel’s 
movement could be tracked through the use of a Kalman filter or a similar method. This 
would prevent negative effects on the formation or tracking of a moving cluster and 


would allow for greater insight and analysis of the water space. 
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The STC algorithm’s scope could be expanded to include more than commercial 
vessels. Given the speed that a small boat swarm might attack with, early detection of 
their formation is an important task. An extension of this thesis work could be applied to 
the detection of such a swarm by applying short time windows and small thresholds for 
distance, heading, and speed differences between contacts. Although the second real- 
world dataset was meant to mimic small boats, conducting the STC analysis on actual 


small boat data would be beneficial. 
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APPENDIX 


The MATLAB code used to implement the spatiotemporal clustering analysis is 


provided in the appendix. 


function Thesis 


% Thesis Top-Level Function Call. 
% Thesis is the function call to access spatiotemporal 


algorithms that support maritime domain awareness. The user will 
select whether their input data is “Synthetic” or “Real-World,” 

and the program will determine which set of functions to use for 
analysis. 

% Created by LT Kristofer Tester, USN, April 2013 


6 Function Input: None 
% Function Output: Proper file path alignment 





clear all 
close all 
cle 

t = cputime; 








% Display a menu offering the user choice of data type 
Zz = menu(‘Select Source Data Type’, ’Synthetic’, ’Real-World’); 


% Process user selection and choose next function call. Matlab 
directory changes to reflect user selection, and to keep data 
files separated and easily organzied. 


if 2 == 
oldFolder = cd(‘Thesis Synthetic’); 
Synthetic; 
cd(oldFolder); 

else 


oldFolder = cd(’Thesis RealWorld’); 
RealWorld; 
cd(oldFolder); 

end 


% Display the time, in seconds, the program takes to run in its 


entirety 
t = cputime 
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function RealWorld 





% REALWORLD Top-level function call for real-world data 
analysis. 
% RealWorld is the top-level function call for real-world AIS 








data analysis. This function begins with data pre-processing and 

filtering, and includes all functions required for spatiotemporal 
analysis of the waterspace. Created by LT Kristofer Tester, USN, 

April 2013 





% Call function to read data input and perform preprocessing 

duties 

[A,default,climit, spdlimit,dlimit, spdinput, storage] =... 
InputandPreprocessing; 


% Loop through the data by time-step. At each time-step, arrange 
a submatrix, B, of the A matrix, such that all members of B have 
the same time-step assignment. Proximity filter and kinematically 
cluster the contacts in the B matrix using the KDVC function and 
return the results in clusters. Store clusters in the cell array, 
compold. 
[compold, Data,maxtime,clustersl] = 
ProxKin(A,default,climit,spdlimit,... 
dlimit, spdinput, storage) 




















% Call function Temporal to perform temporal analysis of the 
kinematic time step snapshots 
[MovingClusters, FuseClusters, DiffClusters, CurrentClusters, timecou 
ne), sas 

Temporal (compold, storage) 





% Build the text output by comparing each cell of MovingClusters. 
If the cell is found elsewhere, determine what time-step it 
appears, and what time-step it disappears. Output this 
information, along with the membership details of the moving 
cluster in a text format. 
Postprocessing(MovingClusters, FuseClusters, DiffClusters, CurrentCl 
usters,... 


timecount, Data,maxtime, clusters1) 














62 


function [A,default,climit, spdlimit,dlimit,spdinput, storage]. 


= InputandPreprocessing; 


{o} 





stream and formats it for use by the algorithm. 


% Created by LT Kristofer Tester, 





USN, April 2013 





% InputandPreprocessing This function reads the input data 


% Display push-button menu for user selection of the input data 


file type 





a = menu('Select Data Format:', 'xlsx Format','.mat Format', 'csv 


Format'); 


% List all files of the type xlsx and .mat in the current 
directory and store the filenames in TestList cell array for user 





selection 

if a == 1; 
Téestrides = dir ("*xi sx"); 
TestList = {}; 
for i = 1l:length(TestFiles) 


filename = TestFiles(i).name; 





TestList{i} = filename; 
end 


z = menu('Choose the Source Data File',TestList); 


Data = xlsread(TestList{z}); 
elseif a == 2; 

TestFiles = dir('*.mat'); 

for b = 1l:length(TestFiles) 








filename = TestFiles(b).name; 


TestList{b} = filename; 
end 


z = menu('Choose the Source Data File',TestList); 


Data = load(TestList{z}); 

















cellData = struct2cell (Data); 

matData = cell2mat(cellData); 

Data = matData; 

elseif a == 3; 

TestFiles = dir('*csv'); 

for j = 1l:length(TestFiles) 
filename = TestFiles(j).name; 
TestList{j} = filename; 


end 


z = menu('Choose the Source Data File',TestList); 


Data = csvread(TestList{z}); 
end 





Z = 


menu('Choose Global Location or Manually Input Area of 


Coordinates',. 








"Strait of Malacca', 'West Coast of Africa', 'Panama Canal 


Mandeb',... 
"Manual', 'Entire Area'); 
iz; S=e 


locationFiltered locationFil 


= 


elseif z == 2; 














elseif z == 3; 





locationFiltered locationFil 
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locationFiltered locationFil 


lter (Data,2,102,4,4); 


lter (Data, 20,-18,5,5); 





lter (Data, 7,-80,5,5); 


v 
aon A 


Interest 


"Bab Al 


elseif 
lo 
elseif 
ha 
ce 
ce 





Z == 4; 
cationFiltered = 
Z == 5: 


locationFilter (Data,12.5,43.5,5,5); 


lA 
ndles = guihandles(LocationFilt); 


, 
ntlat = get (handles.sliderl, 'Value 
ntlon = get (handles.slider2, 'Value 
ight = get (handles.slider3, 'Value' 

") 





dth = get (handles.slider4, 'Value 











onFilter (Data, centlat,centlon,width,height) ; 





Data; 


he 
wi 
close (handles.figurel); 
locationFiltered = 
locati 
elseif z == 6; 
locationFiltered = 
end 





% Store the source data in the Data variable and assign variables 
for the necessary columns of the Data matrix 
locationFiltered; 


Data = 
Lon = 


Data(:,6); 


Data(:,7); 
Data(:,8); 
Data(:,5); 
Data(:,10); 
vr = Data(:,2); 








% Create the A matrix - the variables from Data that are 


ecess 


ary for analysis 


n 
A = [Lon Lat Cog Sog counter UTC]; 


{o} 


defaul 


[A] = 


% Format the A matrix using Data_Read 


t = menu('Parameter Selection', 'Default Values',... 
"User Selected Values'); 
Data_Read (A, default); 


% Add longitude and latitude component of each contact to the A 


matrix 
A= [A 


{o} 





Lon Lat]; 


for use in output formats 





% Initialize looping variables and determine the unique MMSI 


identifiers that are reported in the A matrix 


sizeda 
n= 1; 
k = un 


{2} 


contac 


figure 
while 
J 


ta = size(A); 


ique(A(:,7)); 


% Loop through the unique MMSI identifiers and quiver plot each 


t. This plot represents all of the individual contacts 
reported in the data set. 


(); 
n <= length(k) 


= find(A(:,7)==k(n)) 


scatter (A(j,9),A(j,10)); 


title('Example Contact Picture') 


x 


abel('Degrees of 








v1 





abel('Degrees of 





Longitude") 
Latitude") 





set(gcf,'color','w'); 


ho 
n 


ld on 
= n+l; 
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end 
plot_google_map 


ole 


Initialize looping variables 
= 1; 


| 


% Create slider and text input GUI for user input 








if default == 1; 
climit = 39; 
spdlimit = 16; 
dlimit = 8; 
spdinput = 2; 
storage = 0.6; 
elseif default == 2; 
handles = guihandles(SlideFilt); 
climit = get (handles.sliderl, 'Value'); 


spdlimit = get (handles.slider2, 'Value'); 

dlimit = get (handles.slider3, 'Value'); 

spdinput = get (handles.slider4, 'Value'); 

storage = (get (handles.slider5, 'Value'))/100; 
close (handles.figurel); 

end 














{o} 


% Loop through the A matrix to ensure none of the variables have 
a value of exactly zero 
while n <= sizedata(1) 








if A(n,1) == 0 
A(n,1) = 0.001; 
else 
A(n,1) = A(n,1); 
end 
if A(n,2) == 0 
A(n,2) = 0.001; 
else 
A(n,2) = A(n,2); 
end 
if A(n,4) == 0 
A(n,4) = 0.001; 
else 
A(n,4) = A(n,4); 
end 
if A(n,3) == 0 
A(n,3) = 0.001; 
else 
A(n,3) = A(n,3); 
end 
if A(n,5) == 0 
A(n,5) = 0.001; 
else 
A(n,5) = A(n,5); 
end 
if A(n,6) <= spdinput; 
— A(n,:) = 0; 
else 
A(n,6) = A(n,6); 
end 
n = n+l; 
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end 





% Assign variables 
xX = nonzeros(A(: 


or the columns of the A matrix 





rl))s 
y = nonzeros(A(:,2)); 
dx = nonzeros(A(:,3)); 
dy = nonzeros(A(:,4)); 
cse = nonzeros(A(:,5)); 
spd = nonzeros(A(:,6)); 
counter = nonzeros(A(:,7)); 
timestep = nonzeros(A(:,8))j; 
Lon = nonzeros(A(:,9)); 
Lat = nonzeros(A(:,10)); 


% Store the variables back into the A matrix 
A = [x y dx dy cse spd counter timestep Lon Lat]; 
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function [locationFiltered] = locationFilter(entry,centralLat,... 
centralLon, width, height) 





% LOCATIONFILTER Filter real-world data by defining a boundary 
box. 
% [locationFiltered] = 


LOCATIONFILTER(entry, centralLat, centralLon,width,height) filters 
the data provided in entry that is located inside a boundary box 
defined by the centalLat, centralLon, width, and height. Entry is 
a matrix of AIS position reports with latitude and longitude 
stored in tenths of degrees. CentralLat/Lon indicate the center 
of the area of interest in tenths of degrees of latitude and 
longitude. Width and Height indicate teh size of the area of 
interest in tenths of degrees of latitude and longitude. 

% Created by LT Ashley McAbee, USN, April 2013 











{o} 


%& Define the boundary box 

latLowBound = centralLat - height/2; 
latUpBound = centralLat + height/2; 
longLowBound = centralLon - width/2; 
longUpBound = centralLon + width/2; 





& Find all datapoints in entry that are positioned inside the 
boundary box, and store them in the output variable, 
locationFiltered. 
locationFilteredIdx = find(entry(:,7) > latLowBound & ‘ 
entry(:,7) < latUpBound & entry(:,6) > longLowBound & 
entry(:,6) <... 
longUpBound) ; 
locationFiltered = entry(locationFilteredIdx,:)j; 
end 





67 


function [A] = Data_Read(A,default) 








% DATA READ Data conversion file for real-world AIS datasets. 
% [A] = DATA_READ(A) returns a reformatted A matrix after 





converting latitude and longitude to x and y coordinates where 
the origin is zero degrees latitude and zero degrees longitude. 
The function also converts course and speed into dx and dy 
velocity components. 

% Created by LT Kristofer Tester, USN, April 2013 


% Store the incoming data from the A matrix into appropriate 
columns to assist in determining the degrees, minutes, and 
seconds of latitude and longitude. 








Data(:,1) = floor(A(:,2)); 
Latmin = (A(:,2)-Data(:,1))*60; 
Data(:,2) = floor(Latmin); 
Latsec = (Latmin-Data(:,2))*60; 
Data(:,3) = floor(Latsec); 
Data(:,4) = floor(A(:,1)); 
Lonmin = (A(:,1)-Data(:,4))*60; 
Data(:,5) = floor(Lonmin) ; 
Lonsec = (Lonmin-Data(:,5))*60; 
Data(:,6) = floor(Lonsec); 
Data(:,7) = A(:,3); 

Data(:,8) = A(:,4); 

Data(:,9) = A(:,6); 

counter = A(:,5); 


% Organize degrees, minutes, and seconds of latitude and 
longitude into components for x and y coordinate plotting. 
1(:,1) = Data(:,1); 
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% Initialize looping variables and loop through each row of the 
Data matrix. Determine the degrees of latitude of the contact in 
order to determine the conversion matrix, LatDistConv. Multiply 
the conversion matrix by the y components from above to determine 
a y coordinate on the grid signifying the distance, in nautical 
miles, from zero degrees latitude. 
n= 1; 
while n <= length (Data) 

if Data(n,1) <= 15 

variable = Data(n,1)*((110.649-110.574)/15)+110.574; 























LatDistConv = [variable*0.62 (variable*0.62)/60 
((variable*0.62)/60)/60]; 

y(n,1) = yl(n,:)*LatDistConv'; 

n = n+l; 


elseif Data(n,1) > 15 & Data(n,1) <= 30 
variable = Data(n,1)*((110.852-110.649)/15)+110.649; 
LatDistConv = [variable*0.62 (variable*0.62)/60 
((variable*0.62)/60)/60]; 
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y(n,1) = yl(n,:)*LatDistConv'; 


elseif Data(n,1) > 30 & Data(n,1) <= 45 
variable = Data(n,1)*((111.132-110.852)/15)+110.852; 











LatDistConv = [variable*0.62 (variable*0.62)/60 
((variable*0.62)/60)/60]; 

y(n,1) = yl(n,:)*LatDistConv'; 

n = n+l; 


elseif Data(n,1) > 45 & Data(n,1) <= 60 
variable = Data(n,1)*((111.412-111.132)/15)+111.132; 








LatDistConv = [variable*0.62 (variable*0.62)/60 
((variable*0.62)/60)/60]; 

y(n,1) = yl(n,:)*LatDistConv'; 

n = n+l; 


elseif Data(n,1) > 60 & Data(n,1) <= 75 
variable = Data(n,1)*((111.618-111.412)/15)+111.412; 








LatDistConv = [variable*0.62 (variable*0.62)/60 
((variable*0.62)/60)/60]; 

y(n,1) = yl(n,:)*LatDistConv'; 

n = n+l; 


elseif Data(n,1) > 75 & Data(n,1) <= 90 
variable = Data(n,1)*((111.694-111.618)/15)+111.618; 








LatDistConv = [variable*0.62 (variable*0.62)/60 
((variable*0.62)/60)/60]; 

y(n,1) = yl(n,:)*LatDistConv'; 

n = n+1; 


end 
end 


% Initialize looping variables and again loop through each 
element of the Data matrix to determine the proper conversion 


= 


matrix for longitude, LonDistConv. The number of nautical miles 
in a degree of longitude decreases as distance from the equator 
increases. In order to determine the LonDistConv matrix, first 
determine the degree of latitude of the contact, and then 
interpolate to determine the conversion matrix. 
n= 1; 
while n <= length(Data) 

if Data(n,1) <= 15 

variable = Data(n,1)*((107.551-111.320)/15)+111.320; 






































LonDistConv = [variable*0.62 (variable*0.62)/60 
((variable*0.62)/60)/60]; 

x(n,1) = xl(n,:)*LonDistConv'; 

n = n+l; 


elseif Data(n,1) > 15 & Data(n,1) <= 30 
variable = Data(n,1)*((96.486-107.551)/15)+107.551; 











LonDistConv = [variable*0.62 (variable*0.62)/60 
((variable*0.62)/60)/60]; 

x(n,1) = xl(n,:)*LonDistConv'; 

n = n+l; 


elseif Data(n,1) > 30 & Data(n,1) <= 45 
variable = Data(n,1)*((78.847-96.486) /15)+96.486; 








LonDistConv = [variable*0.62 (variable*0.62)/60 
((variable*0.62)/60)/60]; 

x(n,1) = xl(n,:)*LonDistConv'; 

n = n+l; 





elseif Data(n,1) > 45 & Data(n,1) <= 60 
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variable 


Lonl 





DistConv = 


= Data(n,1)*((55.800-78.847) /15)+78.847; 





((variable*0.62)/60)/60]; 
xl(n,:)*LonDistConv'; 


x(n,1) 


n 


elseif 
variable 


Lonl 





n+1; 
Data(n,1) 


DistConv = 


[variable*0.62 


(variable*0.62)/60 


> 60 & Data(n,1) <= 75 


= Data(n,1)*((28.902-55.800) /15)+55.800; 





((variable*0.62)/60)/60]; 
x1l(n,:)*LonDistConv'; 


x(n,1) 


n 


elseif 
variable 


Lonl 





n+1; 
Data(n,1) 


DistConv = 


[variable*0.62 


(variable*0.62)/60 


> 75 & Data(n,1) <= 90 


= Data(n,1)*((0-28.902)/15)+28.902; 





((variable*0.62)/60)/60]; 
x1l(n,:)*LonDistConv'; 


x(n,1) 


FY 


end 


end 


In 
se = 
= 1; 


Cr 


n+1; 


oop = length(c); 


[variable*0.62 


itialize looping variables. 


(variable*0.62)/60 


% Loop through the matrix to convert course headings in degrees 

into units of radians on the unit circle. 

while n <= loop 
if c(n) == 


c(n) 


= 





4 


“3 


sel 
Cc 
sel 





7 


Cc 
sel 


£ 





Cc 


lsei 


eS 








a) 


c(n) 
elseif c( 
c(n) 


sel 


ie 








=) 


Cc 
sel 





= 





c(n) 


end 
n=n+1; 


end 


% Determine dx and dy components of 





Li) 
1) 


% Call 


the TimeFuseReal function to 


(n) 


(n) 


QO =O =O 


Q 


— 


Cc 


lobrsatrsalsatrats 


Q 


(s/1 
(s/1 





0 


e(n)*(pi/180)=(pi/2); 


~~ 


e(n)*(pi/180)+(pi/2); 


—_ 


c(n)*(pi/180)-(pi/2); 
>0 & c(n)<90 


—_ 


== 90 


== 180 


== 270 


= c(n)*(pi/180)+(pi/2); 


(pi/2)-—(c(n)*(pi/180)); 
)>90 & c(n)<180 
(pi/2)-—(c(n)*(pi/180)); 
)>180 & c(n)<270 
(450-c(n))* (pi/180); 
)>270 & c(n)<360 
(450-c(n))* (pi/180); 





into time-steps. 





velocity as a function of 


convert the data timestamps 


timestep = TimeFuseReal (Data(:,9),default); 
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% Store the finalized data in the A matrix to be sent back to the 
RealWorld function for analysis. 
A = [x y dx dy cse s counter timestep]; 


7 


function [compold, Data,maxtime,clustersl] = 
ProxKin(A,default,climit,... 
spdlimit,dlimit, spdinput, storage) 








% ProxKin Proximity and kinematic filtering function. 

% ProxKin uses k-means clustering to proximity filter the data 
and then kinematically filter those results by determining 
similarity of vessel courses, speeds, and the distance between 
them. 

% Created by LT Kristofer Tester, USN, April 2013 





% Determine the maximum time-step that has been assigned to the 
data 
maxtime = max(A(:,8)); 





% Initialize looping variables 
m= Ll 

compar = []; 

compold = []; 

time = 1; 

Data = []; 

clustersl = []; 


while time <= maxtime 





clusters = []; 

B= ([]; 

sizeA = size(A); 

n= 1; 

while n <= sizeA(1) 
if A(n,8) == time; 

B= [B;A(n,:)]; 

else end 
n = n+1; 

end 

sizeB = size(B) 


, 
if sizeB(1) >= 2 


% Ensure no global identifier is reported more than one 
time during each time-step. If one is, use the 
deconflict function to resolve the issue. 

B = deconflict (B, time) ; 

















Data = [Data;B]; 
[clusters] = 
KDVC (B, clusters, time, climit,dlimit,spdlimit) ; 
sizeclusters = size(clusters); 
clustersl = [clustersl;clusters]; 
if any(clusters) 
sizeclusters = size(clusters); 
maxcluster = max(clusters(:,8)); 
J, = Ay 
while j <= maxcluster 
h = 1; 
while h <= sizeclusters (1) 
if clusters(h,8) == j 
k = [k;clusters(h,7)]; 
else 
end 


ie: 


h = htl; 


























end 
if time == 
if j == 
compar = [compar,k]; 
else 
sizek = size(k); 
sizecompar = size(compar) ; 
if sizek(1) == sizecompar (1) 
compar = [compar,k]; 
elseif sizek(1l) > sizecompar (1) 
m = sizek(1)-sizecompar (1); 
n= 1; 
while n <= Mm 
compar = 
[compar; zeros(1,sizecompar (2))]; 
n = n+1; 
end 
compar = [compar,k]; 
else sizek(1) < sizecompar (1) 
m = sizecompar(1)-sizek(1); 
n= 1; 
while n <= Mm 
k = [k;0]; 
n = n+1; 
end 
compar = [compar,k]; 
end 
end 
else time ~= 1 
if j == 
compar = [compar,k]; 
else 
sizek = size(k); 
sizecompar = size(compar) ; 
sizecompold = size(compold) ; 
if sizek(1) == sizecompar (1) 
compar = [compar,k]; 
elseif sizek(1l) > sizecompar (1) 
m = sizek(1)-sizecompar (1); 
n= 1; 
while n <= Mm 
compar = 
[compar; zeros(1,sizecompar(2))]; 
n = n+1; 
end 
compar = [compar,k]; 
else sizek(1) < sizecompar (1) 
m = sizecompar(1)-sizek(1); 
n= 1; 
while n <= Mm 
k = [k;0]; 
n = n+1; 
end 
compar = [compar,k]; 
end 
sizecompar = size(compar) ; 
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end 
compold{time} = compar; 
compar = []; 
ele 
else 
end 
else 
end 
time=time+1; 
end 
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function B = deconflict(B, time) 


% DECONFLICT Deconflict multiple contact reports. 

% B = DECONFLICT(B,time) returns the B matrix ensuring that no 
contact global identifier is reported more than one time per 
time-step, which is denoted in the variable time. 

% Created by LT Kristofer Tester, USN, April 2013 


ole 


Note —- The B matrix is the A matrix, just named differently 


% Determine if any global identiOfiers are repeated in the B 

matrix, and if so, which ones 

uniqueB = unique(B(:,7)); 

if length(uniqueB) > 1 
countOfB = hist(B(:,7),uniqueB) ; 





indexToRepeatedValue = (countOfB~=1); 
repeatedValues = uniqueB(indexToRepeatedValue) ; 
else 
repeatedValues = uniqueB; 
end 


% Determine the size of the B matrix 
sizeB = size(B); 





{} 


% Initialize looping variables 




















current = 0; 

m= 1; 

spot = []; 

boom = 0; 

% Loop through the repeated global identifiers and either: 

% 1. If the distance between reports is less than 0.5 
nautical miles, average the contacts as one. 

% 2. If the distance between reports is greater than 0.5 
nautical miles, keep the contacts separate and give one of thema 


new global identifier. 
while m <= length(repeatedValues) 
C= [1]; 
spot = []; 
n= 1; 
while n <= sizeB(1) 
if B(n,7) == repeatedValues(m) ; 
C = [C;B(n,:)]; 
spot = [spot;n]; 
else end 
n = n+1; 











end 
sizeC = size(C); 
if sizeC(1) == 
if sqrt((C(1,1)-C(2,1))*2+(C(1,2)-C(2,2))%*2) <= 20; 


{} 


% Average the two contacts' components 








avgxpos = (C(1,1)+C(2,1))/2; 
avgypos = (C(1,2)+C(2,2))/2; 
avgdx = (C(1,3)+C(2,3))/2; 
avgdy = (C(1,4)+C(2,4))/2; 
avgcse = (C(1,5)+C(2,5))/2; 


a 


avgspd = ( 
avglon = ( 
avglat = ( 
if avgxpos 
avgxpos = 





else 
end 
if avgypos == 0 
avgypos = 0.0001; 
else 

end 

if avgdx == 0 

avgdx = 0.0001; 
else 

end 

if avgdy == 0 

avgdy = 0.0001; 
else 

end 

if avgcse == 
avgcse = 


(exe) 


.0001; 
else 
end 
if avgspd == 
avgspd = 


Coo 


-0001; 
Slee 
end 
if avglon == 
avglon = 


Coo 


.0001; 


else 
end 
Fond 


if avglat == 
avglat = 


(exe) 


.0001; 





else 
end 
B(spot(1),:) = [avgxpos avgypos avgdx avgdy avgcse 





avgspd... 
repeatedValues(m) time avglon avglat]; 
B(spot(2),:) = B(spot(2),:)*0; 
sizeB = size(B); 
else 
end 
sizeB = size(B); 
else 
sizespot = size(spot); 
a= 1; 
while a <= sizespot (1) 
if any(spot(a)); 
b = atl; 
while b <= sizespot (1) 
if C(a,7) ~= 0 & C(b,7) ~= 
dist = sqrt((C(a,1)-C(b,1))%2+(C(a,2)- 





C(b,2))%*2); 
if dist <= 20; 


{2} 


% Average the contacts' components 
avgxpos = (C(a,1)+C(b,1))/2; 
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avgypos = 
avgdx = ( 
avgdy = ( 
avgcse = 

avgspd 
avglon 
avglat = 
if avgxpos 
avgxpos = 


C 
C 
( 
( 
( 
( 


( 
( 
( 
C 
C 
C 
C 





Cc 
a 
a 
( 
( 
( 
( 


. 
’ 
’ 
a 
ay 
1, 
1, 


else 

end 

if avgypos == 
avgypos = 0.0001; 
else 

end 

if avgdx == 0 

avgdx = 0.0001; 
else 

end 

if avgdy == 0 

avgdy = 0.0001; 
else 

end 

if avgcse == 0 
avgcse = 0.0001; 


Coo 


.0001; 


avglon = 0.0001; 


if avglat == 
avglat = 


Os 


.0001; 





else 
end 
B(spot(a),:) = [avgxpos avgypos avgdx 
avgdy... 
avgcse avgspd repeatedValues (m) 
time... 
avglon avglat]; 
B(spot(b),:) = B(spot(b),:)*0; 
C(a,:) = B(spot(a),:); 
current H 
spot (b) QO; 
ee Penge (onees Os 
a = at+sizeC(1); 
b = bt+sizeC(1); 





h- 


(spot)) <= 1 








sizeB = size(B); 
b = b+l1; 








else 
b = bt+l1; 
end 
end 
if current == 1 
spot(a) = 0; 
current = Q; 
else 
current = Q; 
end 
a = atl; 
else 
a = atl; 





end 


% Create a new contact global identifier 


sizespot = size(spot); 
J = 1; 
sizeB = size(B); 
end 
m = m+l1; 
end 


% Redefine the B matrix for output. This B matrix will have no 


repeated global identifier values. 

B= anode ca nonzeros(B(:,2)) nonzeros(B(:,3))... 
nonzeros(B(:,4)) nonzeros(B(:,5)) nonzeros(B(:,6))... 
nonzeros(B(:,7)) nonzeros(B(:,8)) nonzeros(B(:,9))... 
nonzeros(B(: 10)913 
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function [clusters] = 
KDVC (A, clusters, time, climit,dlimit,spdlimit) 





% KDVC K-means, distance, and displacement clustering. 

% [clusters] = KDVC(A,clusters,time, climit,dlimit, spdlimit) 
returns the kinematic clusters at a given time-step based on user 
inputs where A is the input dataset, clusters is the matrix for 
output storage, time is the time-step being evaluated, climit is 
the user defined course limit in degrees, dlimit is the user 
defined distance limit in nautical miles, and spdlimit is the 
user definied speed limit in knots. 
% Created by LT Kristofer Tester, USN, April 2013 














% Determine the size of the input dataset, A 
sizeA = size(A); 





% Determine the sample of contacts from A to be used in 
determining the centroids of k clusters using subset-farthest-— 
first methodology 





sample = round(sizeA(1)); 
syms w 
k = round(solve(sample == 2*w*log(w))); 


& Call for function sff in which subset-farthest-first 
methodology is implemented. Returns a matrix, kstart, which 
contains the centroids for k-means clustering. 
if sample >= 3 

eval(['[kstart] = CentroidSff(A,k,sample)']); 
else 

kstart = A(l:sample,:); 
end 





% Conduct k-means clustering on the dataset A using the centroids 
stored in kstart. Store the results as B 


sizekstart = size(kstart); 
if sizekstart(1) == sizeA(1) 
if kstart ~=A 
B = kmeans (A, [],'start',kstart, 'EmptyAction','drop'); 
else 
B = ones(sizekstart(1),1); 
end 
else 


B = kmeans (A, [],'start',kstart, 'EmptyAction','drop'); 


% Redefine the A matrix to incorporate the results of k-means 
clustering, B 

A = [A(:,1) A(:,2) A(:,3) A(:,4) A(:,5) A(:,6) A(:,7) B A(:,8) 
A(:,9) A(:,10)]; 


% Initialize loop variable which aids in titling of the plots 
provided by the function 
h = 1; 


% Define the default line style for a one 
set (0, 'DefaultAxesLineStyleOrder', {'-','--',':"'}) 








ie 


% Begin loop for clustering analysis of A matrix and continue 
loop as long as A has non-zero members 
leginfo = {}; 
while any(A) == 1 

% Determine the row of the A matrix that has the maximum 
speed value. 


{2} 


% Use the corresponding contact as the seed contact. 


[a,b] = max(A(:,6)); 
c = A(b,:); 
A(b,:) = A(b,:)*0; 


& Call function Kinematic which is the actual clustering 
algorithm. Returns the A matrix along with those contacts that 
are clustered together inc 

eval(['[A,c] = Kinematic(A,c,climit,dlimit,spdlimit,h)']); 


% Determine the size of the c matrix 
sizec = size(c); 





% If contacts were clustered together, drop in and plot those 
contacts. If contacts were not clustered together, plot nothing 
and loop to 

if sizec(1) > 1 


% Ensure the contacts in c are sorted by global 
identifier in ascending order 

update = c; 

update = sortrows (update,7); 

clusters = [clusters;update]; 


% Quiver plot the contacts that are clustered together in 
each time-step 

quiver (c(:,10),c(:,11),c(:,3),c(:,4)); 

title('Kinematic Microclusters') 

xlabel('Position in miles from the origin') 

ylabel('Position in miles from the origin') 


leginfo{h} = ['c_' num2str(h)]; 
grid on 

hold on 

set(gcf,'color','w'); 

h=h +41; 


% Determine the size of the matrix clusters and add a row 
of zeros to separate each cluster found in each time-step 





sizeclusters = size(clusters); 
clusters = [clusters;zeros(1,sizeclusters(2))]; 
else sizec(1) < 1 


end 
end 


% Provide the legend information for the cluster plot 
if isempty(leginfo) == 
legend (leginfo) 
elec 
end 
hold off 
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function [kstart] = CentroidSff(A,k,sample) 





% SFF Subset farthest first function 
% [kstart] = SFF(A,k,sample) returns the centroid positions for 


use in k-means clustering that are determined using subset 
farthest first technology, where A is the input data matrix, k is 
the number of centroids to determine, and sample is the size of 
the subset of the A matrix to be considered. 

% Created by LT Kristofer Tester, USN, April 2013 


ole 
ll 


tialize looping variables 


= size(A); 

sizeA(1); 

= round(sizeA(1)/2-k/2); 
A; 


WHEN NUD 


% Initialize a looping variable and assign the first centroid to 


% After assigning the first member of kstart, reconfigure the B 
matrix to exhaust that member 


X = nonzeros(B(:,1)); 

y = nonzeros(B(:,2)); 

dx = nonzeros(B(:,3)); 

dy = nonzeros(B(:,4)); 

cse = nonzeros(B(:,5)); 

spd = nonzeros(B(:,6)); 
counter = nonzeros(B(:,7)); 
t = nonzeros(B(:,8)); 

lon = nonzeros(B(:,9)); 

lat = nonzeros(B(:,10)); 

B = [x y dx dy cse spd counter t lon lat]; 





% Loop through the B matrix and assign follow-on members of the 
kstart matrix (centroids) as the contact that is farthest from 
the previous selection. 
while i< k 

n= 1; 

sizeB = size(B); 

while n < sizeB(1) 

d(n,i) = sqrt((B(n,1)-kstart(i,1)).*%2 + (B(n,2)- 





kstart(i,2)).%2); 
n = n+l 
end 
[F,I] = max(d(:,1)); 
tS aly 
kstart:Gi,;2)) =} Br, *)¢; 
B(I,:) = B(I,:)*0; 





% Reconfigure the B matrix to account for kstart assignment 
and exhaust further possible consideration of the members. 
X = nonzeros(B(:,1)); 
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y = nonzeros(B(:,2)); 

dx = nonzeros(B(:,3)); 

dy = nonzeros(B(:,4)); 

cse = nonzeros(B(:,5)); 

spd = nonzeros(B(:,6)); 

counter = nonzeros(B(:,7)); 

—— NONEEE OS te (78) )% 

lon = nonzeros(B(:,9)); 

lat = nonzeros(B(:,10)); 

B = [x y dx dy cse spd counter t lon lat]; 


end 
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function [A,c] = Kinematic(A,c,climit,dlimit, spdlimit,h) 





% RANGMaT ae Distance and displacement clustering. 
% [A,c] = Kinematic(A,c,climit,dlimit,spdlimit,h) returns the 


kinematic clusters at a given time-step based on user inputs 
where A is the input dataset, c is the seed contact, climit is 
fined course ane in degrees, dlimit is the user 








the user def 
defined distance limit in nautical miles, spdlimit is the user 
definied speed limit in knots, and h is the time-step. 

% Created by LT Kristofer Tester, USN, April 2013 








% Determine the size of the A matrix and initialize looping 
variable 

sizeA = size(A); 

n= 1; 





% Compare each row of the A matrix to the seed contact inc and 
determine if: 

1. They have the same k-means cluster assignment 

2. The distance between them is less than the user defined 





% 
% 





threshold 

% 3. The difference in their courses is less than the user 
defined threshold 

% 4. The difference in their speeds is less than the user 
defined threshold 





%& If so, cluster them together, set the row in the A matrix to 
zero to exhaust it from further consideration, and store the 














if sqrt((c(1,1)-A(n,1)).*2 + (c(1,2)-A(n,2)).%2) < 
dlimit; 
if (A(n,5) >= c(1,5)-climit) & (A(n,5) <= 
c(1,5)+climit); 
if (A(n,6) >= c(1,6)-spdlimit) & (A(n,6) <= 
c(1,6)+ 
spdlimit); 
e(nt+1,:) = A(n,:); 
A(n,:) = A(n,:)*0; 
else end 
else end 
else end 
else end 
n = n+l; 
end 
x nonzeros(c(:,l 
y = nonzeros(c(:,2 
dx = nonzeros(c(: 
dy = nonzeros(c(: 
cse = nonzeros(c(:, 
spd = nouzE TOS Lek: , 
counter = nonzeros ( 
sizex = size(x); 
h = neOnes (izext 
t = nonzeros(c(:, 
Lon = nonzeros(c(: 
Lat = nonzeros(c(: 


1 
9 


i ed 





84 


+ Redefine the c matrix to account for those contacts deemed to 


be clustered with the seed contact 
[x y dx dy cse spd counter h t Lon Lat]; 


io 


c= 
end 
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function 
[MovingClusters,FuseClusters, DiffClusters,CurrentClusters,... 
timecount] = Temporal (compold, storage) 





Temporal Temporal analysis function of STC algorithm. 

Temporal compares the kinematic cluster snapshots from each 
time-step to determine which clusters are moving through time. 
The results are stored in MovingClusters. FuseClusters contains 
the continuity of the moving clusters. DiffClusters contains the 
vessel identifiers of those vessels that join or depart already 
formed moving clusters. CurrentClusters contains the kinematic 
clusters found at the last time-step of data. Created by LT 
Kristofer Tester, USN, April 2013 


oP ol? 








% A snapshot of each time-step has been analyzed, and the 
kinematic clusters that occur at each time-step are stored in the 
cell array compold. Now initialize looping variables and loop 
through compold to determine if any kinematic clusters were 
stored. If there were, assign a value of one to the variable 
moveon. 


Zz = 1; 

moveon = 0; 
timecount = 0; 
timemat = []; 


while zz <= length(compold) 
if any(compold{zz}) 


moveon = 1; 
timecount = timecount+1; 
timemat = [timemat;zz]; 
ZZ = Zz+1; 

else 
ZZ = zZ2z+1; 


end 
end 





% With moveon equal to one it is known that there are kinematic 
clusters stored in compold. It is now time to perform temporal 
analysis on those kinematic clusters and determine if any of them 
move together over time. Again, while looping through the time- 
steps that contain kinematic clusters, the variable fuse will be 
defined as the intersection of the current time-step's kinematic 











clusters' memberships with the next time-step's kinematic 
clusters' memberships. If fuse is greater than or equal to a user 
defined threshold, store the kinematic cluster as a moving 
cluster. If fuse is less than the threshold or zero, compare the 





current time-step's kinematic clusters' membership to the next 
time-step plus one's clusters' membership in order to recalculate 
fuse. If fuse is still less than the threshold or zero, discard 
the kinematic cluster from consideration of being a moving 
eluster. 

if moveon == 


timemat = sort (timemat) ; 

mm = 0; 

sizetimemat = size(timemat); 
time = timemat(1); 
MovingClusters = {}; 
CurrentClusters = {}; 
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FuseClusters = {}; 

DiffClusters = {}; 

while time <= timemat (sizetimemat (1) ) 
mm = mmt+i1; 











if time == timemat (1); 
B = compold{time}; 
C = compold{timemat (2) }; 


else end 
if time <= timemat (sizetimemat (1)-2); 
D = compold{timemat (mm+2) }; 

else end 

















columnB = size(B); 
columnC = size(C); 
columnD = size(D); 
7 '= 4; 
while j <= columnB(2) 
hy = Ls 
sizeB = size(nonzeros(B(:,j)) 


); 
if time <= timemat (sizetimemat (1)-1); 
while h <= columnC (2) 
sizeC = size(nonzeros(C(:,h))); 
Luv = 
intersect (nonzeros(B(:,j)),nonzeros(C(:,h))); 
sizeZ = size(Z); 
if sizeC(1) > sizeB(1); 
fuse = sizeZ(1)/sizeCc ( 
elseif sizeB(1) > sizeC(1); 
fuse = sizeZ(1)/sizeB(1); 
lA 
( 











else sizeC(1) == sizeB(1); 
fuse = sizeZ(l 

end 

katysnumber = 0; 

if fuse >= storage; 
MovingClusters{time,j} = B(:,j); 














FuseClusters{time,j} = fuse; 
if fuse ~= 1 & fuse ~= 0 
X = setdiff(B(:,j3),C(:,h)); 


DiffClusters{time,j} = X; 




















else 
end 
h = columnC(2)+1; 
katysnumber = 1; 
else 
if time <= timemat (sizetimemat(1)-2) & h 


columnD (2); 
sizeD = size(nonzeros(D(:,h))); 
Z = intersect (nonzeros(B(:,j)),... 
nonzeros(D(:,h))); 
sizeZ = size(Z); 
if sizeD(1) > sizeB(1); 
fuse = sizeZ(1)/sizeD(1); 
elseif sizeB(1) > sizeD(1); 
fuse = sizeZ(1)/sizeB(1); 
else sizeD(1) == sizeB(1); 
= sizeZ(1)/sizeB(1); 

















else 


end 





end 
if fuse >= storage; 





MovingClusters{time,j} = ] 


FuseClusters{time,j} = : 


= 








if fuse ~= 1 & fuse ~= 0 











X = setdiff(B(:,j),! 
DiffClusters{time, j 











else 
end 
h = columnD(2)+1; 
katysnumber = 1; 
else end 
else end 
end 
h = h+tl1; 
end 
else time == timemat (sizetimemat(1))j; 


CurrentClusters{1,j} = B(:,j); 


J = gly 
end 
B= C; 
C =D; 
if mm < sizetimemat (1) 
time = timemat (mm+1); 
else 
time = max(timemat)+1; 
end 
end 
return 
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funetion 
Postprocessing(MovingClusters,FuseClusters, DiffClusters,... 
CurrentClusters,timecount, Data,maxtime,clusters1) 














% Postprocessing Function to handle the development of usable 
text and visual outputs. 
% Postprocessing creates user-friendly text and visual outputs 


from the cell array inputs to provide the user an understanding 
of the algorithm’s findings. 
% Created by LT Kristofer Tester, USN, April 2013 





Cle 

moveon = 0; 

ZZ = 1; 

sizeMovingClusters = size(MovingClusters) ; 





while zz <= sizeMovingClusters(1)*sizeMovingClusters (2) 
if any (MovingClusters{zz}) 
moveon = 1; 


ZZ = SsizeMovingClusters(1)*sizeMovingClusters(2)+1; 





ZZ = 2z+1; 
end 
end 
if moveon == 1 
CompClusters = MovingClusters; 
[]; 
{}; 
time = 1; 
1? 





clusterl = []; 


= ("Moving Clusters:'); 
disp(str) 
= (' '); 
disp(str) 
while time <= sizeMovingClusters (1) 
h = 1; 
while h <= sizeMovingClusters (2) 
if any (MovingClusters{time,h}) 
b = nonzeros (MovingClusters{time,h})j; 
index = 1; 
occur = []; 
while index <= 
sizeMovingClusters(1)*sizeMovingClusters (2) 
if any(intersect (MovingClusters{index},b) ) 
J = nonzeros (MovingClusters{index}) ; 
sizeindex = size(J); 
sizeb = size(b); 
if sizeindex(1) == sizeb(1) 
Le J == 
[I,J] = 
ind2sub(sizeMovingClusters, index) ; 
occur = [occur;I]; 
MovingClusters{index} =.. 
MovingClusters{index}*0; 
index = index+1; 
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else 
index = index+1; 
end 
else 
index = index+1; 
end 
else 
index = index+1; 
end 
end 
if any(occur) 
1 = 141; 
occur = sort(occur); 
sizeoccur = size(occur); 
b = nonzeros(b); 
sizeb = size(b); 
sizeCurrentClusters = size(CurrentClusters); 
current = Q; 
q= 1; 
while q <= sizeCurrentClusters (2) 
Q = nonzeros(CurrentClusters{q}); 
sizeq = size(Q); 
if sizeq(1) == sizeb(1) 
if nonzeros(CurrentClusters{q}) == 





nonzeros(b) 
current = 1; 
gq = sizeCurrentClusters(2)+1; 
else 


q= qt; 


q = qtl; 


end 
if current == 1; 
occur = [occur;maxtime]; 
if length(occur) == max(occur)- 
min(occur)+1; 
str = ['Cluster ' num2str(l)... 
" containing contacts 
Y,;num2ser (O°): jp :0a.« 
' begins at time 
",num2str(occur(1)),... 
’ and is a current cluster’); 
else 
str = ['Cluster ' num2str(l)... 
" containing contacts 
",;numMZ2ser (b" p06 
" begins at time 
",num2str (occur (1.) ) >. ss 
', is a current cluster, but gains or loses 
members']; 


end 
disp(str) 
current = 0; 
else 
str = [*Cluster * num2Zstr(1)... 
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" containing contacts 
* num2str (b' ) 7 «5 
" begins at time ', 
num2str(occur(1)),... 
" and ends at time ', 
num2str (occur (sizeoccur(1)))]; 
disp(str) 
end 


% Average the cluster-contacts into a 
representative cluster at each timestep 
they occur in order to store the 
information for presentation on the visual 
OutLpUE. 

sizeoccur = size(occur); 

mm = 0; 

y = occur(1); 

while y <= max(occur) 

mm = mmt+i1; 




















z= 1; 
clust = []; 
while z <= sizeb(1) 
K = find(Data(:,7)==b(z)); 
sizeK = size(K); 
if sizeK == 1; 
clust = [clust;Data(K,:)]; 
Z = 2+1; 
else 
clust = [clust;Data(K(mm),:)]; 
Zz = z4+1; 
end 
end 
avgxpos = mean(clust(:,1)); 
avgypos = mean(clust(:,2)); 
avgdx = mean(clust(:,3)); 
avgdy = mean(clust(:,4)); 
avglon = mean(clust(:,9)); 
avglat = mean(clust(:,10)); 
avgcse = mean(clust(:,5)); 
avgspd = mean(clust(:,6)); 
clusterl = [clusterl;avgxpos avgypos 


avgdx avgdy... 
y FuseClusters{time,h} 1 avglon 
avglat... 
avgcse avgspd]; 
if mm < sizeoccur (1) 
1); 





y = occur (mm+ 
else 
y = max(occur)+1; 
end 
end 
d= 1; 


while d <= length(CurrentClusters) 


( 
kk = nonzeros(CurrentClusters{d}); 
11 = HONEST OSC TUee ia 1) 1s 
sizekk = size(kk); 
sizell = size(l1l); 








if sizekk == sizell 
if length(intersect (nonzeros... 


(CurrentClusters{d}),nonzeros... 


(clust(:,7))))/length(nonzeros... 
(clust(:,7))) == 1 
f = nonzeros(clust(:,7)); 
g= 1; 
while g <= length(f) 
row = find(clustersl(:,7) == 
£(g),1); 


clustersl(row,:) = 
clusters1 (row, :)*0; 





g = gtl; 
end 
else end 
d = d+l1; 
else 
d = d+l1; 
end 
end 
sizeclusterl = size(clusterl1); 
clusterl = 
[clusterl;zeros(1,sizecluster1(2))]; 
end 
else 
h = htl; 
end 
end 
time = timet+1; 
end 
sizeclusterl = size(clusterl1); 
o = max(clustersl1(:,8)); 
r= 1; 
b= []; 
1 = 1+1; 


% Reorganize the matrix clusterl for administrative purposes. 

% Determine the maximum and minimum of the cardinal 
directions for plotting purposes. Call the moneyscatter function 
to create an interactive visual output the function 
plot_google_map will lay the google map representation of the 
area underneath the output plot for better user situational 
awareness. 























clusterl = [clusterl(:,8) clusterl(:,9) clusterl(:,3) 
clusterl(:,4)... 
clusterl(:,5) clusterl(:,6) clusterl(:,7) 
clusterl1(:,10) 
ec 








lusterl(:,11)]; 











west = min(nonzeros((clusterl(:,1)))); 
east = max(nonzeros((clusterl(:,1)))); 
south = min(nonzeros((clusterl(:,2)))); 
north = max(nonzeros((clusterl(:,2)))); 








PlotScatter(cluster1) 
title('Interactive Visual Cluster Representation') 
xlabel('Degrees of Longitude") 


22 





ylabel('Degrees of Latitude") 

axis([west-0.5 east+0.5 south-0.5 north+0.5]); 
set(gcf,'color','w'); 

plot_google_map 





{2} 


% Using the cell array DiffClusters, determine which contacts 
join and depart other moving clusters throughout the timeline of 
analysis. This information is then output in text format for the 
user to better determine potential contacts of interest. 


str = (' '); 











tr) 
('Contacts of Interest:'); 


str Cae ie 
disp(str) 
h = 1; 
sizeDiffClusters = size(DiffClusters); 
sizeDC = sizeDiffClusters(1)*sizeDiffClusters (2); 
while h <= sizeDC 
if any(DiffClusters{h}); 
occur = []; 
m = nonzeros(DiffClusters{h}); 
d= ly 
sizeCompClusters = size(CompClusters) ; 
sizeCC = sizeCompClusters(1)*sizeCompClusters (2); 
while j <= sizeCC 
if any(intersect (m, CompClusters{j})); 























[I,J] = ind2sub(sizeCompClusters, j); 
occur = [occur;I1]; 
J = jtl; 
else 
J = j+l; 
end 
end 
dep = max(occur);% + 1; 
current = 0; 
if length(occur) == timecount-1 
if length(m) > 1 
str = ['Contacts ' num2str(m').. 


" are a moving cluster that join and depart other larger 
elusters:..' ]; 
disp(str) 
else 
str = ['Contact ' num2str(m')... 
" is a moving cluster that joins and departs other larger 
elusters.']; 








disp(str) 

end 
else 

kk = 1; 
sizem = size(m); 
while kk <= sizem(1) 

if any(find((Data(:,7) == m(kk)) & 

(Data(:,8)... 
== dep)) Jy 
s(kk) = find((Data(:,7) == m(kk)) & 
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(Data(:,8)... 














== dep) )7 
kk = kk+1; 
current = 1; 
else 
kk = kk+1; 
end 
end 
end 
if current == 1 
course = 0; 
speed = 0; 
sizes = size(s); 
kk = kk-1; 
if sizes(1) == 1; 
course = Data(s(1),5); 
speed = Data(s(1),6); 
else 
while kk >= 1 
course = (course+Data(s(kk),5))/2; 
speed = (speed+Data(s(kk),6))/2; 
kk = kk-1; 
end 
end 
if length(occur) == max(occur)-min(occur) +1 
str = ['Contact ' num2str(m')... 


" joins a cluster at time 


num2str(min(occur))... 


" and remains with the cluster until it departs at 


time '... 
num2str(max(occur)) '. 
course ' 


num2str(round(course)) ' 


at average speed of 


"'The contact departs on average 


num2str(round(speed)) ' knots.']; 
disp(str) 
else 
str = ['Contact ' num2str(m')... 


" joins a cluster at time 


num2str(min(occur))... 
', departs the cluster, 
at time '... 


num2str(max(occur)) '. 


and then rejoins it, 


finally departing 


"'The contact departs on average course 


num2str(round(course)) ' at average speed of ' 


num2str (round(speed))... 
" kote.” 
disp(str) 
end 
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end 


return 
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function PlotScatter(cluster1) 


6 PlotScatter Interactive scatterplot output 
% PlotScatter(clusterl) returns an interactive scatterplot of 


the information stored in the clusterl matrix. The user is able 
to interactively click on the scatterplot to determine 
information from various data points. 

% Created by LT Kristofer Tester, USN, April 2013 


% Set default line style for the scatterplot 
set (0, 'DefaultAxesLineStyleOrder', {'-','-','-','-'}) 











Fg 


% Open new figure 
fh = figure(); 





% Plot various colors on a far reach of the figure to enable 
labeling in the legend 
hl = plot(le6,1le6,'r'); 





hold on 

h2 = plot(le6,1le6,'color',[1 .5 0]); 
h3 = plot(le6,1le6,'g'); 

h4 = plot(le6,1le6,'color',[0 .5 .5]); 
h5 = plot(le6,1le6,'color',[.25 0 1]); 


% Loop through each row of clusterl and plot each cluster in the 


appropriate color and style according to its attributes 
sizeclusterl = size(clusterl1); 
while n < sizecluster1(1) 


£ 


if any(clusterl(n,:)) 











if clusterl(n+1,5) == 0; 
Lf == || (clusterl(n+1,5) == 0 & clusterl(n-1,5 
== 0) 
if clusterl(n,6) == 1; 
if any (xy) 
xy = [xy;clusterl(n,1) clusterl(n,2)]; 
plot (xy(:,1),xy(2,2),"rxu-"); 
hold on 
grid on 
xy = []; 
else end 
plot (clusterl(n,1),clusterl(n,2),'ro') 
hold on 
grid on 
elseif clusterl(n,6) >= 0.75 & clusterl(n,6) < 
if any (xy) 
xy = [xy;clusterl(n,1) clusterl(n,2)]; 
plot(xy'(s>,1),xy¥ (2,2), 'xs", “eolor™,. [1 .5 
O01); 
hold on 
grid on 
xy = []; 
else end 
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) 


1; 


plot (clusterl1(n,1),clusterl(n,2),'o','color',[1 .5 0]) 














hold on 
grid on 
elseif clusterl(n,6) >= 0.5 & clusterl(n,6) < 
O27 52 
if any (xy) 
xy = [xy;clusterl(n,1) clusterl(n,2)]; 
plot (xy(:,1),xy(:,2),'gx-'); 
hold on 
grid on 
xy = []; 
else end 
plot (clusterl(n,1),clusterl(n,2),'go"') 
hold on 
grid on 
elseif clusterl(n,6) >= 0.25 & clusterl(n,6) < 
0...5% 
if any(xy) 
xy = [xy;clusterl(n,1) clusterl(n,2)]; 
plot (xy (2 ¢1),x%y (272); "R=", eoler’,; (0 .5 
S]); 
hold on 
grid on 
xy = []; 
else end 


plot (clusterl1(n,1),clusterl(n,2),'o','color', [0 .5 .5]) 











hold on 
grid on 
elseif clusterl(n,6) > 0 & clusterl(n,6) < 0.25; 
if any (xy) 
xy = [xy;clusterl(n,1) clusterl(n,2)]; 
plot (xy (2,1) ¢ky (242) 7 RH"; Vee lor"; [25 -0 
1]); 
hold on 
grid on 
xy = []; 
else end 


plot (clusterl1(n,1),clusterl(n,2),'o','color',[.25 0 1]) 
hold on 
grid on 
else clusterl(n,6) == 
scatter (clusterl(n,1),clusterl(n,2),'ko') 











hold on 
grid on 
end 
n = n+1; 
else 
if clusterl(n-1,6) == 1; 
if any (xy) 
xy = [xy;clusterl(n,1) clusterl(n,2)]; 
plot (xy(:,1),xy(:,2),'rx-'); 
hold on 
grid on 
xy = []; 
else end 


| 


plot (clusterl1(n,1),clusterl(n,2),'ro"') 
hold on 
grid on 


= 


elseif 





F 4 


Le 


clusterl(n-1,6) 


any (xy) 
xy 
plot (xy(:,] 


hold on 
grid on 
xy = []; 


else end 


plot (clusterl(n,1),clusterl(n,2),'o', 





hold on 
grid on 


elseif clusterl1(n-1,6) 





£ 


if 





at 


(xy) 
= [ 


lse end 


clusterl(n-1,6) 


any (xy) 
XY [xy;cl 
plot (xy(:,] 


hold on 
grid on 
xy = []; 


else end 


plot (clusterl(n,1),clusterl(n,2),'o', 








hold on 
grid on 
elseif clusterl1(n-1,6) 
0.25; 
if any (xy) 
xy = [xy;cl 
plot (xy(:,1 
1)); 
hold on 
grid on 
xy = [l]; 
else end 


plot (clusterl(n,1),clusterl(n,2),'o', 








hold on 
grid on 
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[xy;clusterl1(n,1) 
Ll), xy (2,2), *S=", *eolor’ ,.[1 





lusterl(n,1) 
1),xy(:,2),'gx-'); 





lusterl(n,1) 





lusterl(n,1) 


Ll), SY (i 2) 7 


“eoome™ ;. [th 


“eoduor”; [.0 


>= 0.75 & clusterl (n, 6) 


< 


clusterl(n,2)]; 


wo 


>= 0.5 & clusterl(n,6) 


“0 


0]) 


clusterl(n,2)]; 


>= 0.25 & clusterl(n, 6) 


< 


clusterl(n,2)]; 


lL) 7 VV C2 2) 57 Va, Veeder’ 0 


ao 


> 0 & clusterl(n,6) 


“D 


-9]) 


< 


clusterl(n,2)]; 
x-', 


‘peta. 2215: 0 


'eolorl;, [25 °Q. a]') 


else clusterl(n-1,6) == 
scatter (clusterl(n,1),clusterl(n,2),'ko') 


hold on 
grid on 
end 
n = n+1; 
end 
elseif clusterl(n+1,5) ~= 0; 
xy = [xy;clusterl(n,1) clusterl(n,2)]; 
n = n+l; 
end 
else 
n = n+l; 
end 
end 





% Define the legend properties and labels for each color 
legend([hl h2 h3 h4 h5],'100%', '75-99%', "50-74%; ', '25-49%', '0- 
% Enable data cursor mode in the figure and call the function to 
enable interactive use and response from the figure 

dem = datacursormode (fh); 

datacursormode on 

set (dem, 'displaystyle', 'window'); 

set (dem, 'UpdateFcn', {@poscluster,cluster1}) 











6 Function call to provide the information for data points on the 
scatterplot based on where the data cursor is. 
function InfoBox = poscluster(obj,event_obj,clusterl1) 





{o} 


% Determine the position of the user mouse click and store in pos 
pos = get(event_obj, 'Position'); 





% Define the portions of pos that correspond to the x and y 


{2} 


% Search the clusterl matrix for the contact in the selected x 
and y posiition 


a = find(clusterl(:,1) == x); 

b = find(clusterl(:,2) == y); 

c = find(clusterl(:,5) ~= 0); 

% Set the variable row to the contact in the given x and y 
position 

row = intersect(a,b); 

row = intersect (row,c); 


% Set output text accordingly (Could be anything we want it to 
be) 


InfoBox = {['Longitude: ' num2str(pos(1),4)],. 
['Latitude: ' num2str(pos(2),4)]. 
['Cluster: ' num2str(clusterl(row,7))]... 
['Time: ' num2str(clusterl(row,5))]... 
['Average Course: ' num2str(round(clusterl(row,8)))]... 
['Average Speed: ' num2str(round(clusterl(row,9)))]}; 
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